C-8 zwave network issues

Hi,

I migrated from a C7 to a new C8 about a month ago, and things have been running flawlessly up until yesterday. There was no known precipitating event; I just noticed that a few normal device commands weren't going through, and started poking around to find out the problem was much more widespread. When running a zwave repair, almost every device fails, when it has been a very stable and happy network for years (from a C5 until now).

Here's what's been done, and answers to questions I've seen in other posts:

  • Firmware is latest, 2.3.7.145. Radio is 7.18
  • Hub is powered by the included power adapter
  • Hub has been shut down and power cycled, many times and for many minutes
  • Swapped antennas
  • Almost all devices are (historically) single-hop
  • Restored from backup with radio, twice. This actually resurrected it for some devices for a bit, then it died again within minutes
  • Attempted to restore a C8 backup back to the C7, but the radio didn't restore, gave up on that path

Not sure where to go from here. Will attach a zwave details page in thread - although most of the devices that are showing routes since the last restore are runreachable again, even when sitting right next to the hub.

-Jeffrey


Zwave details and repair failures:

Remove the one that says pending. You may have to disconnect power to it. The rest of your z-wave looks good. The blank ones simply haven't woken up and reported in. You can manually wake them up if you want but they should all report in within 24 hours.

Those are pending because I've hit "refresh" on them... more than half of the network is in that state. The failed repair list shows the devices that I can reproduce that behavior on. I only have about 4 devices that have been reliable since whatever-this-is happened.

I've taken 4 devices off the network so far - and all had to be force removed because they wouldn't communicate to exclude. Then I had to attempt to repair their 'ghost' devices, then force remove from the zwave details list. I've only had one successfully exclude, then include. I'm about 4 hours in just on those few devices. It's a mix of manufacturers and device types as well.

Is it expected to have that almost complete list of devices failing the network-wide repair?

Thanks for reading my "Read First" post and giving us the details up front!

No, I have around 50 devices and can do a full repair with only 1-2 failures on the first pass, then individual repairs will work on those devices.

Thats sort of what I figured when I saw it, I would not try to remove or add any new devices for the time being.

Do you recall if your neighbor counts usually look like that? Most of your devices have very few neighbors. This usually means the devices are VERY spread out, or you have some sort of major interference in the house. It would usually not drop down that low so quickly just from some issues and repairs. Possibly though, if you have done multiple repairs it caused all the devices to drop all the neighbors. With your number of devices I would expect to see 10-20 neighbors on all the devices.

Just realized you did a cloud radio restore, that probably reset the mesh which accounts for the low neighbors... Did you do the cloud restore before or after trying a full power down with power disconnected for 10+ seconds?

Hey, thanks for the assist. No, the neighbor count is usually pretty much everybody-can-see-everybody... it's a small house with only a couple of devices out of direct range of the hub. It seems plausible that the wrecked neighbor count is a result of the failed repairs, or maybe due to the same cause, but I'm not an expert.

I did do a cloud radio restore - also a possible explanation. It's been powered down a couple of times since then, including once for 20 minutes.

So I have most of the network back up and running, with only a few of the most problematic plug-in modules left on the desk. It's been stable for a couple of hours. The neighbor count was still 1 on everything but a single device that I had run a manual repair on, so I took a chance and hit the network-wide repair button. Everything succeeded this time, except one door lock that probably needs batteries. But, interestingly: Neighbor counts are still 1, even after a refresh. Fun pictures to look at:


zwave repair partial network

OK, after a second repair completed I got updated neighbor counts. This looks much better...

Ok... so I suspect that from the first incident you probably only needed to power off the hub for 10+ seconds and then restart it. I think that would have gotten you going again.

I suspect that possibly you only rebooted at first (which many people do, not realizing). When that did not work you took more drastic measures. The cloud restore reset your mesh which causes the 1 neighbors count and everything has to rediscover the routes/mesh. Then when you finally did power down and restart, it was already trashed, and multiple repairs just were making it worse.

Does that sound plausible?

I did restart and power down the hub the first time before doing anything else. The restores and repairs came after.

Bad news though: Went to bed with all the devices in that screenshot tested and working, and woke up to a dead zwave network again. I'm not doing anything now except discovery (seeing if anything is left working), then will shut down and power off the hub. Will report back in a bit.

You said up above you did a cloud restore so you must have Hub Protect and cloud backups scheduled. There is a known issue being worked on in Beta, where a cloud backup will sometimes crash the Z-Wave radio.

You can test it by trying a manual cloud backup, and also checking the Location and Hub events log tabs, to look for errors about backups or zwave crashing.

The recovery from this crash is a power unplugged for 10 seconds then restart to reboot the radio.

I do have Hub Protect and am running cloud backups.

So some devices worked when it was in the state this morning, some close to the hub and some not. Other devices were not responding at all. I didn’t check door locks and things, but a quick run through of a dozen lights resulted in about 50/50.

Shut down, unplugged for a minute, restarted - that woke it up and it’s back to 100% now.

The only thing I saw in the logs is a cloud backup in the wee hours… I’ll try to run a manual and see if that breaks it!

Its not in the normal logs, it would be in the Hub / Location Events tabs on the logs page. I always forget which one but check both for anything suspect looking. I think the zwave crash should get logged in there, unless this particular issue is causing it to also go undetected and not log it.

@bcopeland possible point of reference for cloud backup / zwave issue if you need more data?

1 Like

Confirmed in the Hub logs, a weekly cloud backup did occur but nothing about a radio crash or anything else out of place in either that or Location Events. I ran a manual cloud backup and we're still up and running.

The four oldest devices that I have - the ones that were the worst behaved through this - are still not added back to the network. That's a leviton DZPD3 and 3 gocontrol PD300Z's. They're all a pain in the butt to include/exclude so I'm throwing them away and replacing with some new zwave 800 plugs, Probably just a superstitious step but gonna do it anyway since they're already removed.

Other than that, will watch closely for any new failures.

If you need the dimming feature of that DZPD3, Leviton does have a f/w update available for it... I'm still using one, and I've had no issues with mine (and it gets used daily).

[GUIDE] How to update Leviton z-wave firmware - :bellhop_bell: Get Help / Devices - Hubitat

1 Like

And I also made a custom driver for it upon request: [DRIVER] Leviton Dimmers (DZPD3 / DZ6HD)

1 Like

Hey thanks - I actually tried that firmware update and driver last night using the "good" instructions, and after about the 25th "wake up your sleepy device" I decided my time was worth more than the module :slight_smile:

The firmware on mine, interestingly, was showing '0.5'

So we made it though two nights stable. Not sure what happened to cause the nightmare, but I'll keep watch on it and will spin up a new request if it resurfaces. Will also be on the lookout for the fix that's in Beta in case radio crash was part of this... the restores I did wiped the logs so I'd never know :slight_smile:

Thanks all!