S2 Authenticated Z-Wave Devices Not Responding to Commands

I am having an issue where some of my devices will stop responding to commands. It seems to last for a few days (maybe a week) then will clear up on its own (even though I try everything I can think of to fix it). It has probably happened a half dozen times since I switched to the C8 in the spring of 2023. It is happening right now and for the first time, I have been able to find a common denominator with all devices that are not responding. They are all S2 Authenticated.

Info and troubleshooting to date

  • C8 Hub, currently on 2.3.7.146, but has been a persistent problem throughout all firmware versions

  • Can't find any specific things that trigger the issue

  • Devices do not respond to commands (on, off, refresh, etc)

  • Devices DO complete z-wave repairs just fine

  • Most of the devices are direct to the hub and not using repeaters

  • Non-S2 devices never have an issue

  • No unusually high route changes or error rates

  • When it works, it's fast and consistent.

  • I have tried rebooting the hub

  • I have tried rebooting with a database repair

  • Disable and re-enable the Z-Wave radio

  • Individual device repair as well as complete z-wave repair (always works fine, including the devices that don't respond to commands)

  • Even killed the main breaker to the entire house for a few minutes

As I said above, I have narrowed this down to the devices that are S2 Authenticated.

Any help would be awsome.

Thanks!

That's odd - S2 should help beef up message integrity, not weaken it.

Maybe a long shot, but is the firmware on those particular devices all current?

I haven't checked all of them, but I did update a few of the Inovelli switches to the most recent. It didn't fix anything. It really seems to me like it is a problem at the hub. I am just not sure what other common point of failure there could be that would only impact the S2 devices.

Hard to say... A year or so ago, I moved as many ZW devices as I could from "None" to S2 Authenticated (28 of 30), and my mesh has never been better.

I know it's not fun replacing ZW devices, but you could experiment with knocking a few back down to "None" and see if they succumb again in the next burble. If they don't, then it may be worth moving the rest back to "None" in your case.

I would really hate to be going backward. It almost seems like some sort of encryption bug on the hub itself. I did just move one of the devices from S2 to None and it works fine now. All of the other S2 devices are still not responding.

I am going to flip it back to S2 now and see if it stops working again. Standby..

Another clue that points to an encryption problem on the hub. I cannot get it to re-pair with the hub using S2. It fails and creates a ghost node.

Sorry your z-wave list is hard to read, so how many of those devices are sensors or power metering?

Also how long had it been since last reboot when you took that screenshot.

Have you tried shutting the hub down and removing power for 30 seconds, then boot up again?

S2 Auth works fine for many other people, some people have all devices paired with S2. So it seems unlikely that is an encryption problem with the hub itself.

None of them are power metering. In my experience, power metering over z-wave doesn't work well, so I usually stick to Zigbee for power metering when I need it. Only one of them is a sensor (Ring Smoke Alarm Detector). The rest are mains-powered switches.

It was only about 30 minutes since the last reboot when I took that screenshot, but the stats don't change much. I get no errors and very few route changes, if any.

I just did the hub shutdown, left it off for about 20 minutes, and fired it back up. Still having the same problem.

I know S2 works fine for most, which is why this is so strange. And when it does work for me, it works great. I still find it interesting that things like "repairs" work fine on the S2 nodes. It's only the actual commands that don't work. That tells me that the problem is at a higher tier on the network stack than most problems.

Looks like a mix of Jasco, Inovelli and Zooz switches?

Are you using my driver for the Zooz? [DRIVER] Zooz ZEN Switches Advanced (and Dimmers)

Have you tried the Supervision Encapsulation setting on there? All it does is encapsulate the command in a supervision request, which the device is supposed to respond to. If no response comes back the driver tries the command a couple more times.

You could also enable trace logging on my driver to see if anything comes back from the device. Every single message that hits the driver is logged this way.

That is a nice driver. I love the logging capability! I just tried it on one of my switches and it didn't change anything. Just to make sure, I ran a repair on the device again to make sure it was online, and the repair worked fine.

[dev:14]2024-02-04 21:16:45.332[debug]Basement Kitchen Can Lights: Supervision Check #1 - Packet Count: 0

[dev:14]2024-02-04 21:16:43.570[debug]Basement Kitchen Can Lights: refresh...

[dev:14]2024-02-04 21:16:42.239[debug]Basement Kitchen Can Lights: refresh...

[dev:14]2024-02-04 21:16:37.352[warn]Basement Kitchen Can Lights: Supervision MAX RETIES (3) Reached

[dev:14]2024-02-04 21:16:37.323[debug]Basement Kitchen Can Lights: Setting supervisionCheck to 8000ms | 30 | 30 | 200

[dev:14]2024-02-04 21:16:37.320[warn](Basement Kitchen Can Lights: Re-Sending Supervised Session: 30 (Retry #3)

Power source issues have been known to cause problems with Z-wave. How is it powered? Try switching to a different known good 1A or higher USB-A power block.

Thanks, that was recently added. It used to be the same options as most system drivers.

The good thing about using my driver (at least for this test) is I know for a fact it is not a driver problem.

That was an interesting thought. I just swapped out the brick with two known good ones. I also bypassed the UPS that I have behind it just to make sure it wasn't a voltage or waveform issue with the UPS. Still no change.

Going to see if staff has an idea, this is a new one for me.

@gopher.ny @bobbyD @bcopeland
Would engineering logs tell you anything? Have never seen anything like this with only authenticated devices failing.

@bob5 you said this happened before but it recovered? Seems like this time it is stuck or is this within the normal recovery time? If anything was going to bring it back I would have thought the full power down would do it.

What is this node? It has no route and low neighbors.

Also, have you looked at the Z-wave logs (view logs button on Z-Wave details page). It is mostly useless IMO but it might tell you if there is some sort of spam storm going on that you cannot see with the normal logging. If I am not adjusting devices that page stays blank. I tried turning a light on/off and it created just two log entries.

Of the few times it has happened before, it self-corrected after a few days. I would try all the usual troubleshooting steps with no success. I think the longest it lasted was about a week, or maybe a week and a half. This is still within the normal recovery time.

This is a pool pump. It is currently powered off (and has been since the end of September).

I have, and it is empty until I issue a command. All of my non-S2 devices are working perfectly, so I doubt it is a network congestion issue.

It's not stuck, at least I don't think it is. It is just driving me crazy, and my hope is to get it figured out this time. It's almost impossible to troubleshoot something like this when it is working. This is also the first time that I was able to narrow it down to only the S2 devices.

One more thing that I just discovered. If I physically turn the switch on or off, the status change gets reported back to the hub instantly, just like you would expect it to. Tried it on an Inovelli, Zooz, and Jasco switch just to make sure it wasn't brand or device-specific. So it is only an issue with communication going out of the hub, and once again, only on the S2 devices.

@gopher.ny @bobbyD @bcopeland

Does the Hubitat staff have anything I can try? I am at a complete loss for next steps.

I can officially say that this is no longer an intermittent thing. I have been unable to get any S2 devices working since Feb. They still report status changes to the hub, but they do not respond to any hub issued commands. Things like repairs and pinging also work fine, just not commands. I have Jasco, Inovelli, and Zooz devices that are all doing the exact same thing. Devices with no security work fine.

Did you happen to migrate z-wave from a C5 hub?

Just a thought :thinking: as a test...

Can you remove one or two S2 device and readd it with no S2 ? And see what happens?
See if it starts working regularly?