Major Zwave issues after power outage

I'd ask folks read this till the end because the situation isn't super obvious.

On Monday night we lost power at 1:30AM. Ever since this time my zwave mesh has been completely non-functional.

First, this is NOT because my hub unexpectedly lost power. My hub is on a UPS and had power the whole time. Also a note, all of my zwave devices are zwave plus.

I do also have a home generator, but it's only 15kw so it does not power every single device.

  1. So here is the scenario, we lost power.
  2. For 90 seconds, the only zwave devices available were my hub and a handful of battery sensors I have (that all go through repeaters so essentially they were unavailable)
  3. After 90 seconds, the backup generator automatically kicks in, however, only about 40-50% of my zwave devices are powered by the generator
  4. At 4:30am (so 3 hours) the power was restored and now all devices were back on the mesh

First let me say, I totally understand why this caused havoc. Of course having dozens of devices drop off the zwave mesh (including the ones closest to the hub that act as repeaters) drop off the mesh simultaneously is going to cause major problems for a mesh network.

But here's the problem, it's now been >24 hours and nothing is working. I have tried rebooting the hub, powering down the hub and unplugging it for 5 minutes, no luck.

I tried doing a zwave repair because I figured this scenario, in a sense, is almost the same as having moved devices around, but every single node comes back as unreachable, even devices that I can turn on/off from Hubitat. As a last ditch effort I did a cloud restore (including zwave radio) since I thought maybe the hub radio was in a bad state.

After doing the radio restore some things started working, but others still will not. I have nearly 50% of my devices indicating they have no route and when I click Refresh, it just says PENDING.

So a couple questions:

  1. Is there anyway to avoid getting into this state? I thought having my hub run on a UPS was going to be a good idea for stability and since "everything is local" I never thought of it being a problem. Do most people power down their hubs when there is a power outage to prevent this? Is there anyway to make the hub go into a "don't mess with the mesh right now, I know it's totally screwed up and anything you do to try to fix it is only going to make this worse" mode?
  2. When you get into this state after a power outage, what is the correct way to get out of it? Again, I totally get why this broke the mesh. The mesh was essentially destroyed and changing. However, I thought after 24 hours things would be working again. I also thought that doing a repair would have helped, but it made things worse (devices that were working stopped working).
  3. What should I do now? I'm in a state where almost half my devices won't work, doing a Refresh or Repair on the device does nothing to help.

I'm stuck, my Zwave is effectively non-functional... again. Is zwave really THIS fragile when it comes to a power outages? I mean I expected problems, but what is the path to recovery? Power outages aren't super common, but they're also not unexpected.

Edit: Post doing a cloud restore (including zwave radio), I did another repair. This is the results, this is very clearly not good and I haven't the slightest clue of what to do (aside from excluding and re-pairing every zwave device again which isn't an option for me at this point). For those who don't wish to count, that is 77 out of my 91 nodes that are unreachable:
image

cc @bcopeland @bobbyd

I do basically what you do, but the difference is 100% of my devices are on standby power. I have had the hub report most zwave devices offline (makes sense, since they do go offline for about 30 seconds) but a reboot resolved the issue.

In your case I wonder if powering down the hub for 30 min would make any difference? I have read that it basically gives the zwave devices long enough to realize they've actually lost contact. The amount of traffic likely to be generated if they all start sending explorer frames is probably large so it may take things a while to settle down after that.

Perhaps someone with more zwave experience will weigh in to validate the above :slight_smile:

Yes, Z-Wave is unforgiving when it comes to unresponsive nodes. Also Z-Wave repair is not a magic tool that heals dead nodes en-masse, those need to be addressed first, before running a full mesh repair, as doing so may cause more harm than good.

Repairing unresponsive nodes one by one allows you to address any issues with a particular device. If the device is unable to reach the hub, the hub will not be able to reach it either, and yes the entire mesh will suffer (especially devices that depend on that particular node to communicate with the hub).

I've personally never been in this situation, but if I were to face this problem, I would probably go through the list of failed nodes, one by one, starting with the closest to the hub and power cycle each device until I get a successful repair on that node, then move outward. Pretty much you'll need to rebuild your routing table hopefully without the need to exclude and re-include devices.

3 Likes

I have. Several days without power after Ida. I did power up my Z-Wave Hub first, and then had to cut a few breakers off/on to get powered devices back online. The battery-powered sensors came back by themselves. Took a little while though.

4 Likes

That sounds fine in theory, but in practice it's not really possible. Meaning, that's not exactly how circuits are laid out. So if I turn off all my breakers (which it sounds like what you're suggesting?) there isn't just a "closest to my hub" breaker so I'd kind of be turning things on a bit haphazardly.

Soooo am I essentially up a creek here? I'm not planning to pull each device out of the wall, that's way too much effort and frankly, this would be the 5th time I've had to do this and I'm hoping this isn't what I have to do here?

No, that's not what I meant. Work with your devices, not breakers. Hopefully your mesh is not overwhelmed by busy messages, so that when you run an individual repair on a device, it would complete the repair successfully. If fails, then hopefully your switch has an airgap to power cycle it for 30 seconds or so. If it doesn't, then you may consider cutting the power to it at the breaker.

Unfortunately the individual repairs all fail. And most devices (jasco and zooz toggle switches) do not have air gap switches.

image

This device is within 2ft of the hub, direct line of sight, no obstructions whatsoever except air particles

It's possible that all the busy messages hosed the radio, do a soft shutdown and power it back up to clear the radio then try the single device repair again.

3 Likes

Does that include unplugging? I've heard people suggest the radio doesn't reset unless you unplug it?

Yes, use the shutdown in the menu though not just pull power.

3 Likes

Given the number of devices having issues you might want to open the zwave log and watch for busy messages while you go through this. You may need to pull power several times while repairing all these devices. And resets will probably be required more frequently at first and then taper off as devices quit spamming discovery frames.

5 Likes

Best practice is to unplug the socket from the outlet, leave the wire plugged into the hub, especially if you have a C7 hub with the micro-usb.

2 Likes

I just went through a power outage that was about 10 minutes long. I was also having difficulty with my Hue integration. I tried all the things you did. Some of my devices began working after I manually operated them (switches and dimmers). My battery devices were the biggest contrarians. About two days later everything magically was happy again.

In ruminating on what happened with my system last weekend, I have come to the conclusion, the system just needs time to heal itself. Not popular with the other members of the household ("I'm about ready for you to rip all of this out!"), but it is true. It is a self organizing network after all. It might even heal faster if we quit futzing with it. Not sure about that, but I think it is a reasonable assertion.

Think about how we have learned to build strong Z-wave meshes. Start with a few devices close to the hub and let it sit for 24+ hours before adding more in ever increasing circles waiting each time for the mesh to absorb the new members. In a recovery from a power outage, the mesh members think they know things about the other mesh members, but there are a LOT of holes from devices that are offline for one reason or another. They start to change their routing tables, then devices show up and they have to start reevaluating their optimal path to the hub. It is a very iterative process that requires a lot of communication with a lot of devices.

To sum up my thoughts and experiences: be patient. Give it time. It will work out most of the kinks by itself, then handle the outliers.

Good call! I feel like I'm going to break that thing any time I remove it!

1 Like

https://community.hubitat.com/uploads/default/original/3X/6/7/67648b97a486364af219308f9c9bb36b5059b7cf.jpeg

1 Like

Is the routing information still valid? If not might it be stored in an older database backup?
My thought is if the OP knew what the routes were they could approach repair knowing where to start.

Don't remove it, unplug the AC adapter.
Or unplug the USB cable from the adapter.

1 Like

I’ve also found that sometimes when devices become unresponsive, turning them on and off seems to help. I also will sometimes do a refresh.

@dman2306, any chance you have a device to see trafic on the Z-Wave frequencies? One thing that I have seen on occasion is that some devices would spam the mesh, creating a lot of noise and stopping other devices from communicating with the hub. The way I would fix this is to turn off the whole house (main breaker) for a few seconds while the hub was on UPS.

2 Likes

I had a power outage last year that caused a couple of GE/Jasco switches to fail, so you might also consider this as a possibility.

3 Likes

Where I can, I do direct associations, but not in many places, because it is not flexible. No hub required.

1 Like