C8 Hub Z-Wave Keeps Dying

Platform version:
2.4.3.126
Hardware version:
C-8
Diagnostics tool version:
1.1.131
IP address:
192.168.1.8 (c8south)
Hub ID:
****8e0f9

This hub keeps failing. I shut it down, remove power (sometimes up to an hour - no difference) and power it back up and everything works - for a while. Then it stops controlling devices (all Z-Wave).

These are NOT "Z-Wave crashes" (though those do still happen occasionally after cloud backup) as there is no crash in the log. Sometimes a device will respond after a very long delay (longer than it takes to go #1). In the case of lighting control the Zigbee motion sensor and the automation run from a different hub - then hub mesh to a Z-Wave light to control.

Other times the devices never turn on or off and also cannot be controlled from UI directly on this hub.

Another indication I get is the red LED will start flashing on Ecolink Chime Siren (meaning no Z-Wave connection).

The "fix" is full power off/on - but other times it will start working again on its own!?

This has been going on longer than 3 weeks now. Today I did Soft Reset to see if that helps. No major changes to this network - last device added was the Ecolink Siren and that was a few weeks before the problem started. LAN has 2 new Konnected GDO (WiFi). All 3 HE hubs are physical LAN and no changes to their network (static and DHCP reservation and their IPs are not in DHCP scope).

I've run out of ideas - bad radio? @support / @bobbyD can someone check on back end please?

Is this the same hub you were having similar Z-Wave problems last year? Based on the details you've shared, this could be the same combination of problems that you've reported last time, mix of S0 devices and/or excessive power reporting. Have you made any changes based on last time's recommendations? Do you think you may have (new) devices that may be going bad? Sharing screenshots of your Z-Wave details table might help narrow down your problem.

1 Like

Something is overwhelming the mesh.

3 Likes

Not sure if I love or hate the fact you remember from a year ago ;-). But I believe that was my other Z-Wave hub at that time - I'd have to go look.

Right now I only have None or S2 Auth - no S0.



You guys rock with fast replies!

You have several power reporting devices. Make sure you turn off, or decrease the reporting period. The symptoms you've described are consistent with your radio running out of bandwidth: Dos and Don'ts of Z-Wave Power Reporting (repost)

We checked your hub's engineering logs and nothing stands out to indicate a hardware issue.

1 Like

I do? I didn't find anything with a power reporting switch.

Do engineering logs get cleared if I checked the box on soft reset? If so, might have to check after it dies next.

Might be good to look at that zwave details after the hub has been running a for a bit. Refresh the stats with the button up top then look for devices with a lot of messages. That might help narrow it down.

There is also the possibility that a device is going haywire and sending out garbage transmissions, which clog up the mesh but wont get counted by the hub. Those are much harder to track down.

2 Likes

Many (all?) plugs/outlets, as well as newer switches have power reporting capabilities. Absence of adjusting the power reporting parameters in a driver (especially a custom driver) doesn't mean the device doesn't send power reports to the radio, which could result in overloading it. Generally, built-in drivers allow adjustments to the reporting param.

1 Like

Most of my Z-Wave outlets, switches, dimmers, and fan controllers are GE/JASCO from 2016 and use the default/generic built-in drivers. I also don't see anything in state variables about power.

Are there "better" drivers I should be using?

With these drivers I don't see anything in Preferences that I can change...

1 Like

Make sure you don't have a device that is failing and spamming the z-wave mesh. Something is inundating the mesh. If it isn't legitimate reports (like frequent power reports), then it could be a malfunctioning device.

3 Likes

Thanks for pointing me specifically where to look - not sure I found any problem though. 156 messages isn't "high", is it?

Nope.

But garbage sent by a malfunctioning device will not show up here. It may show up in your z-wave logs.

Alternatively, remove power from your powered z-wave devices one at a time, and if any particular one quiets the mesh down. This is slow because you have to wait for a little while after disabling each device.

1 Like

Ouch, for 60 devices - that would take forever.

Interestingly, from my post with the messages - the worst "offender" for automation not working is master toilet light - and it has the most hops. Just tried a rebuild on Garage Hot Water Implant (highest messages) and it failed. And tried Master Toilet Light - it succeeded, but kept same route - the switch is 25 feet from the hub. hmmm.

Did you actually refresh the stats (then refresh the page)? The RTT is all blank and they are showing 0 route changes.

If message count is not showing any outliers then next I would look at the RTT and Route changes.

I had only refreshed the page. Tried both now - still about the same.

I was looking at RTT and noticed most are blank or very small ms - but one was 656ms and another (that new siren) was almost 65000ms - wth? I did a rebuild on it and now it is not responding. Timing on "issues" is close to when I added this bugger. I just removed it for now.

Strange question - but should my Z-Wave logs be empty? For hours? I was able to generate logs by turning a device on/off from the app - but shouldn't there be a steady flow in the logs? Now even worse - I can't control any devices from the app or UI. When I try on/off even that now does not enter in log.

If you are running the legacy ZIP gateway then the zwave logs are sparse and nearly useless.

I somewhat disagee - I had nearly the exact same situation as the OP, and it was a older JASCO/GE switch with a failing cap after a power failure (a well know failure scenario). And the Z-wave log, was clearly calling all high frequency and repeated messages from the dying switch - Powering it off, and all the Z-Wave mesh came back as expected.

Given older JASCO gear, my money is on a failing device spamming the network - And the Zwave logs will help you isolate the source of the repeated messages. (Assuming the hub can stay up long enough for you to get some visibility into this log). -

And no, the Z-wave log shouldn't be empty with ZIP, there should at least be some logs of your sending messages to devices. So not entirely sure what that is about.

1 Like

I agree, all it takes is one device to put down Zwave.

Just last week, I woke up to my Zwave down. A hard reset of the hub would bring Zwave back for a minute, then it would die again. I then realized that the only light that didn't turn off that night as it should have was the kitchen sink Zwave dimmer switch.

I air gapped that switch to reset it and everything came back and has been fine for a week now, though I know from experience I will probably be replacing that switch down the line if it happens again, which I expect it will. I chased down two bad dimmer switches about a year ago that were doing the same thing, then they finally got to the point where resetting them didn't fix the mesh so I replaced them and all was good for over over a year until last week with the sink dimmer.

I have a Zsniffer stick here and what I have found in the past is when a device locks up like that and it takes down zwave the device is may be just spewing out garbage transmissions. Just loads and loads of garbage with CRC errors at a fast pace. The hub radio (and all the devices) gets all that and it has to sift through it to try and find real transmissions. You can imagine how that grinds it to a halt pretty quickly.

Basically it was exceeding the bandwidth limits of the mesh.

I did notice on this last fail that many devices were appearing to be active in the log, but their transmission speed was zero. They would have a valid transmission speed showing right after a hard reset, then it would go back to zero on all devices.

Once I suspected the sink dimmer, I also noticed that the sink dimmer was giving zero transmission speed even after a reset, while the other devices were showing a speed and working for minute before they dropped to zero and stopped.

So I wasn't seeing any spamming or garbage, in fact the logs looked pretty normal except for the zero transmission speeds.