I'm running a C7-series hub and am having a problem with control for all (yes – every single one of them) of my Z-Wave devices. Each device will still successfully report status updates (i.e, I do a manual switch on/off, the hub will see that) and power reporting and general event logging from each device is flowing to the hub just fine, but none of the devices will accept control commands.
Everything on my mesh was working perfectly for months and this situation came about the day after I changed the IP address of my hub (which I wouldn't think would be related to Z-Wave device functionality in any way/shape/form).
I ran a Z-Wave repair and as you can see, there's lots of unreachable nodes for the ping operation:
I show no ghost devices in the route column of the Z-Wave Radio Devices report under "Z-Wave Details" in Settings.
I'm at a loss for what to do next - I've spent a considerable time setting up my devices the way I like them, creating my dashboards, rooms and rules, and to have a total failure like this is just disheartening. Do I have to exclude everything and start over?
If so, how do I prevent this from happening again?
I don't see how it would be related to an IP change. You should not have to exclude everything and start over again. A zwave repair is likely not going to help. In fact a full repair is not recommended on the C7.
I personally don't find the zwave topology to be all that helpful but it could be I'm just not that great at reading it. Off the top of my head though I'd say your zwave mesh may not be all that healthy. A number of your devices don't appear to have many neighbors. It may also be that you've got a device acting as a repeater that has become unstable.
In the mean time... any devices with S0 security? Any Zooz 4-1 sensors? Any power-reporting sensors or any sensors that are particularly chatty?
I would also shut the hub down cleanly, remove power (at the outlet side, please... not the hub side), wait a minute or so, and plug it back in. This gives the zwave radio a chance to reset. You can also try a much longer shutdown, say 20 min, which will cause your mesh to start more-or-less frantically rebuilding.
I'll load up the z-wave details app and give it a go and see what I can find there.
None of my devices are using security (as you can see on the details page) and I do not have any ZooZ 4-1 sensors. I do have a fair number of ZooZ ZEN71 wall switches, two ZooZ multi-port power strips, a number of ZooZ ZEN15 power switches and a couple of GE plug-in switches,
On power reporting, I've been turning that down as I install new devices so as not to overwhelm the mesh, but I do notice that all my ZooZ devices report "EnergyDuration" in the logs like mad, but that's been going on for months now with no ill effects (so far).
I do reboot the hub every morning via the "Rebooter" app, but have been doing that for months now as well - with no ill effects.
I'll try the 20 minute shutdown and see what I get and will report back.
The above is good advice, but this also caught my eye:
If you don't have a need for this, I'd disable it on those devices, and I'd consider setting it to the minimum reporting you need to get the data you need on the devices where you do. The less traffic on your Z-Wave mesh, the better, and some power-reporting devices can be quite chatty (as can S0, hence one of the recommendations above; color bulbs can be, too, and even worse if it's both).
This fixed it! More specifically, it was the long shutdown (I waited about 20-25 minutes with the hub powered down and the power supply pulled) that restored operation to the Z-Wave radio. The short power-down didn't make a difference at all, unfortunately.
Upon power up after that long period, everything is now responding just as it was before all of this began.
However - this begs a bit of a bigger question - how often does this typically happen and is there a way to prevent the Z-Wave radio from getting "hung" like it was? This is the first time I've experienced this, and I hope it's not a recurring theme.
Thank you @bertabcd1234 - I'll continue to pare down the power reporting as much as I can. I can see that it's a burden on the mesh and will do my best to eliminate as much of it as I can live without.
I've never experienced it personally but I have read about others having the issue. I don't think in most cases it happens particularly often.
One problem with rebooting daily is it resets all your zwave stats, particularly RSSI, response time, and the number of route changes. Those can all be good indicators of a device that's unhappy and might be causing your issue. It might make sense to hold off the nightly reboot for a few nights and let those counters accumulate some data. Is there an underlying issue that your nightly reboots are trying to address? If so maybe it makes sense to look at that. If it's just a sort of "just in case" thing maybe shut it off for a while and see what the data looks like.
Just by looking I'd suspect maybe you need to focus on building the strength of the zwave mesh. You have a bunch of devices that have only one neighbor. That would suggest to me you need some more line-powered repeaters. You've also got a couple of devices running at 9.6kbps, indicative of communications challenges. I have a couple too and they are always a bit problematic. In my case it's two specific types of devices - Leviton outlet plugs and GE wall outlets. I have never been able to resolve the issue, I just deal with it. Those devices are less responsive but since they control air conditioners rapid response isn't that big a deal.
Take a look at this article if you haven't already...
@brad5 Well, I spoke too soon. Overnight the radio must have gotten into the same "hung" state and I've now lost control of all of my devices again as of this morning... I was very disappointed to say the least. I'll go through the same power cycle event and confirm that restores operation and go from there.
All of my devices are mains powered. I don't have a single end device in the mesh at this point, so I'm left to wonder if the power reporting is the issue. I'm not sure how to improve the routing (or to make manual changes) outside of running the repair on a single device or doing exclude/include.
Are there any ways to target specific portions of the mesh to try to improve it?
I've reconfigured the power reporting intervals for all of my ZooZ devices and greatly increased the timeframe for them to report in, but they aren't honoring the new parameters (I'm checking the logs on each device and they are still pushing power reports to the hub every minute) so I'm going to have to sort that out with ZooZ technical support, it would appear.
Thanks for the link on building a solid mesh... I had read that before, but it's always good to refresh.
Yeah that's definitely not gonna help. You may have to force the devices to "check in" and get the new configuration.
I have been able to get a specific device to perform better by excluding it and re-including it in place, but I've never had a sort of system-wide failure like you're seeing. I'm still kind of focused on the number of devices that have only one neighbor. That seems odd to me unless you have a really large space you're trying to cover. But even then... why ALL devices? It would be really good to not reboot for a few days and see what things look like, especially a device with a large number of route changes or really bad RSSI or RTT.
When it all goes south, is there any indication in your zwave details page or an issue? Anything in the zwave log or in the hub logs?
@EightBitWhit needs to eliminate that devices reporting as it's likely crushing the mesh... I also wonder if there is any database corruption with all these crashes.
@EightBitWhit Go to settings>>backup and restore and click the download button at the bottom and save your database (this will clean the database as it's being backed up to your pc)
Goto yourhubip:8081 and do a soft reset. Upon reboot when prompted, restore the database that you backed up to your PC. This may not fix the issue but will ensure a clean DB. Afterwards shutdown from the settings menu, unplug again for 20 mins (at the wall not the hub).
Nothing in the ZWave or the Hub logs... it's like the radio just loses the ability to send commands.
Due to the fact that once I go through the power cycle process and bring the hub back online, all the devices will nearly-instantly respond to a status change event (on/off) it makes me think it's something going on with the hub or a specific device that's causing the issues.
For a while, everything in the mesh will work exactly like it should... until it doesn't. It's confounding.
I'll hold off on the daily reboots to see what additional information I can gather.
@rlithgow1 - thanks for the additional help here. I have contacted ZooZ technical support and requested firmware updates for my ZEN15 power switches. I now have the files for version 1.06 and am going to update each of the 9 switches that I have to that version (I have a mixture of 1.02 and 1.04, but none currently are on 1.06) and see if that allows me any better success in controlling the power reporting frequency that I'm currently seeing.
FYI that I did do the soft reboot process that you noted above.
Updating everyone on my progress with some good news - over the past couple of days, I've been able to get all of my ZEN15 switches upgraded to firmware 1.06 and have eliminated a couple of the very small "generic" Z-Wave power switches I was using (they are the Shenzhen Neo devices that never communicated faster than 9.6kbps in my setup). I'm going to upgrade to some Aeotec 700-series switches to take their place.
With that, everything is now working really well and over 90% of my devices are now reporting directly to the hub @ 100kbps.
I'm not sure that I can truly pinpoint what my problem really was, but I can already tell that the ZEN15's are now holding their place in the mesh much better and the network feels much faster and more responsive than it ever has.
Here's my current (new) topology map, and it's looking much, much better: