C7 Lost All Z-Wave Devices Overnight!

C7 - 2.3.9.201

We went to bed last night with everything working normally. We woke up this morning and none of the normal automations were working. Everything runs via webCoRE pistons, so I checked there first. Everything looked fine. I looked at the hub and noticed device states weren't updating for motion sensors even though the events tab showed everything working normally. All devices were showing in the devices tab. Digging deeper, I noticed that there were no Z-Wave devices listed in the Z-Wave details page. Everything was gone. I tried a soft reboot, but that did not bring them back. I powered the hub down and left it unplugged for about 10 minutes. Upon power up, everything was now showing in the Z-Wave details page and all automations were back to working again.

I have not done anything with the hub recently, aside from minor changes to webCoRE pistons. No devices have been added or removed. No updates have been performed.

I have noticed that whenever the hub reboots, there is an initial warning about sever hub load, which eventually becomes a warning about elevated hub load... which eventually goes away. I always assumed this was normal and ignored it.

Can anyone explain what might have happened to cause this? Is total failure of this hub imminent? Short of replacing the hub, is there anything I can do to prevent a catastrophic failure?

I have a Z-Wave door lock and a Zooz light switch that will, very rarely, just quit talking to the hub (C7). A power off and back up fixes them. No idea why but it's so rare that figuring out what causes it is probably impossible.

Do you do cloud backups?

I’m on a C8 2.3.9.301 and while it didn’t use to happen, I now see severe/elevated hub load on restarting. After 5 minutes the alert goes away.

I have taken to waiting until that message goes away to do any work on the hub. Kind of a bummer as the shutdown/power-off/unplug/restart dance now takes a solid 10 minutes. If I have to do it multiple times to address and issue the time really drags on.

That requires a subscription, right? If so, I am not doing cloud backups.

The radio database crashed. Happens sometimes. Power cycling like you did reloads it. You only have to do it for 30 seconds to 1 minute (so the capacitors drain) and power it back up. It's annoying I know, but nothing to be inherently worried about. I would run a DB rebuild for giggles and get a Hub Protect sub so you can back up both your z-wave and zigbee radios as local backups do not allow that. It also will give you a free replacement hub if something happens to your existing one. (you will likely get an upgrade if that happens because there aren't any c7's left in stock)

1 Like

Thanks. That's good to know.

I assume a Hub Protect subscription is per hub? What happens with webCoRE is a hub is replaced or backup restored?

If you're using the external version of webcore you can just transfer it. If you're using the built in version (recommended) it will just transfer when you restore..

Yes Hub protect is on a per hub basis (remote admin should you choose to get it is for all hubs on the account but you can also just VPN in for that instead) Hub protect is definitely worth it.

1 Like

Perfect. Yes, I already use VPN for everything, so remote admin wouldn't be necessary.

Thanks for the help!

1 Like

Correct. :slight_smile:

While a ZWave radio crash seems likely - I would also ask the last time you rebooted before this episode, as ALL versions still seem to have small/slow memory leaks. - I personally have RM rules set to do a reboot when Free RAM gets below 180K, and I usually can get 30-45 days between reboots (assuming you not doing versions upgrades more often than that)

But I've personally seen multiple issues (radios included) when RAM gets very low, say > 90 days of uptime. - Depends alot on the various integrations that you have, but just curious when you last reboot (before all the issues this AM) occurred -

Just a thought..

low memory tends to cause lockups. OP's hub was working except nothing was in the z-wave list. This is typical of a z-wave radio crash and solved with a power cycle whereas low memory issues can be cured by a standard reboot (power cycling is different)

1 Like

The last reboot was a few weeks ago when I was recabling the PDU. As a rule, I only reboot during upgrades or when things seem to be acting weird.

1 Like

That's not my experience (perhaps I've caught things before they get that bad) - low memory, for me, has manifested as slow response, in the browser, as well as automations as well as missed network events (both missed incoming events and outgoing commands), YMMV

That said, a completely empty Zwave device table DOES indeed sound like a radio crash. - My point is that when memory gets very low, lots of bad (and atypical) things start to happen.

At least monitoring or alerting when hub memory gets under 150K may be in order here, but it all depends on when the last reboot occurred.

My initial suspect was a cloud backup as well, but given there is no hub-protect in play here, that's ruled out as well. In 3-4 years of usage, across multiple hubs, I've never had a completely random radio failure, not saying it can't happen, but it likely depends on a mesh topology, end point devices, etc.. Also, a soft boot, not fixing things, does indeed point to a radio failure, it's just a question around the root cause of the radio failure - Anything interesting in the hub/location logs?

Just wanted to add "memory monitoring" as an awareness item as well.

What about the severe and elevated load following a reboot? Is that normal?

Yes

It isn't out of the ordinary, I don't see it every time or on both of my hubs. It should clear within a minute or two after reboot, if the warning persists after a couple minutes of uptime, there is likely an issue.