Z-Wave radio repeatedly going offline at random times, requiring power cycle

C7 device running 2.2.5.131 and 2.2.6.130 (no change between versions) is exhibiting z-wave crashing behavior several times per day. The symptom will be unresponsive switches, but it is shown in the System Messages as: "z-wave is unresponsive. shut the hub down, unplug it for 30 seconds, and plug it in." This power cycling brings everything back online and the system will usually be fine for several hours.

Generally, if I go to bed and the system is working, by the morning this error will have happened and I need to reset the hub. This morning I was working on the system until approximately 2:30am, and the system was offline by 8am, with no user interactions that I'm aware of precipitating the issue.

The reason I was up late last night was that I followed the 'join a USB secondary controller to remove ghost devices' guide, which was successful in removing the ghosts. I had high hopes that the ghosts had been contributing to the instability, but as the system was unresponsive this morning, no luck.

I have several Innovelli Illumin lamps that are paired as S0, which I plan to either re-pair as unauthenticated or replace with Zigbee lamps, but as they've been in the network for some time I just bring it up as a point of info.

The network is reasonably large, with about 90 Z-wave devices, and about 15 zigbee.

What should I be doing to debug this?

My c-4 is experiencing the same zwave symptoms, but I am not getting the log entries

Lots of people have the same problem.
I only have 13 Zwave devices on my C7 hub and have had the same problem with it.
Support claims it is due to a device interference not the hub.
I did remove 2 Aeotec Multisensor 6 sensors only because they were the easiest to replace with iris zigbee devices.
I may have got lucky.
If it happens again though I will try powering down 1 device at a time to see if the hub zwave comes back.
A lot more work for you with 90 devices.
I have my house divided up with 4 hubs. One for each floor and a 4th hub for cloud and dashboards. If one hub has an issue it only affects that floor and much easier to trouble shoot.

That would be... Disappointing... If their response was just a shrug. I searched the forums and only found two other threads that mention the zwaveCrashed event, and one of them was helpful in letting me know I can do a trigger off that. So now I'm adding a rule to power down the hub when it's detected and then node-red will bring the hub back online.

Are there other threads I can read up on this?

Ah, found the magic keywords ("zwave unresponsive"). Yikes. Hopefully they can help me find the bad actors.

I did have a bad zwave light switch 1-2 years back and whenever I physically pressed the light switch it would make my zwave lock go unresponsive.
Other than that the bad light switch worked fine.
I just happen to notice it and up until then I was blaming my lock.
First thing is to make sure you do not have any ghost zwave devices in the zwave list and if you do remove them.

No ghosts at this time. I'd had several, but removed them last night, hoping that would improve the situation, but the crashes still happen quite regularly. Probably 5 restarts today (so far)

it will take awhile but power one device off at a time and see if the crashes stop.
Maybe even half the devices and narrow it down by process of elimination.

I reviewed our response to your ticket and it looks like the recommendation was that running repair when devices are unresponsive could result in a frozen Z-Wave mesh. As you noted, you had a bad device in your mesh. Z-Wave from that perspective is unforgiving. One single bad device in conjunction with running repairs can and will create havoc in your mesh. If you didn't do so already, check out this post to learn more about Z-wave Repair:

1 Like

There are two common reasons for Z-wave problems as you describe. The same as mentioned above, in @NoWon's case or an overwhelmed radio that is bombarded with frequent events. If you have power/energy reporting devices, try to reduce the number of events they generate. Screening the logs would tell you if you have power reporting devices that generate events every few milliseconds apart. If you don't think you are dealing with either of these problems, then please send an email to support@hubitat.com so we can further investigate.

My email actually said I previously had a ghost device in my network and after removing it and rebooting my hub Zwave was still failing.
The C7 hub Zwave was failing before running any Zwave repair.
To which support said it was due to one of my devices.
Which it may very well be.
Hub may look like it is failing but it may actually be a faulty device causing it.
I don't expect support to figure out if one of my devices is faulty.

I happen to remove my Aeotec Multisensor 6 (2 of my 13 zwave devices on the C7) simply because they were the easiest and so far I have not have any failures. But the failures only occurred every week or so (4-5 times total).
So will see if or when the C7 hub Zwave fails again.
Wait and see game now.
I had another Aeotec Multisensor 6 on one of my other C5 hubs and I didn't have any issues.

1 Like

Hey @jpelzer - I can help a bit possibly. I had similar issues with my C7 and I believe the culprit is S0. I would have automations that never ran or partially ran and while @bobbyD and I tried our best to figure it out, we couldn't get to the bottom of it.

Recently, we released a beta version of firmware for the Ilumin bulbs and I paired all of them non-secure and my C7 has been absolutely rock solid. Hasn't missed a beat on about three weeks now.

Here's a link to the beta firmware that should help:

Hopefully it helps!

2 Likes

Thanks Eric! The S0 problem had been top candidate, so I'll definitely try this update. I have about 50 Inovelli dimmers and switches, and had 8 Illumin, down to 4 right now. I'll try those 4 with this update tonight.

I've had same issues on a c5 for well over a year

Well, I was getting very slow speeds and timeouts attempting to do the firmware update, so I'll have to do that separately. I removed the remaining 4x S0 bulbs from my network, and now it's worse! LOL. I'm getting about 30 minutes uptime max. I'm going to do a soft reset and a cloud restore.

For anyone wanting the auto-recovery, I have a Rule Machine entry that triggers when it sees the zwaveCrashed event and shuts down the hub, and then a Node Red flow that is pinging the Hubitat and if it gets 5 consecutive timeouts will turn off the hub via Kasa wifi switch, and turn it on a minute later.

From my admittedly limited understanding of z-wave, the C7’s 700 series chipset is different from the 500 series chip in the C5 and C4.

Not sure how likely it would be for C4 or C5 z-wave issues to be directly related to what the OP is reporting. Consider starting another thread or opening a support ticket?

1 Like

Well, something is definitely screwy. I took an Inovelli double outlet plug and did a remove... It removed a red series dimmer switch upstairs. I removed it a second time... It removed a GE outdoor plug. Third time, it removed the outlet I was attempting to remove. Fourth time, it removed 'unknown device' as you'd expect. Reincluded the dimmer, then the modem locked up again.

I must have raised at least 3, they go unanswered.
The team on hear try a bit but apart from try next firmware, as of yet.

Best fix I have is removing z-wave devices and replacing with zigbee . It reduces the frequency of lock ups