Z-Wave storms - looks like HE Hub's fault?

We haven't forgotten about this issue. It is being worked on. Requires restructuring Dashboard's security design (obviously). I don't have a date yet for when a fix will be released.

I agree. We are working on a number of things to improve various aspects of the hub. At the moment there is a large focus on Z-Wave, and this has been consuming most of our effort for some time. We have made progress recently on overall hub performance, dealing with issues that caused some hub slow-downs, and mobile app functionality and reliability.

There would be no excuse for pretending there aren't areas that need improvement. We do strive to make the hub the best available, but by no means think we're where it needs to be.

20 Likes

This was more a comment on the community not the Hubitat staff.

Any official answer for this particular delayed Ack issue? Would there be a fix for c5 and earlier or is this not a “bug” that would be fixed given the maintenance status of these devices?

1 Like

So, this topic kinda went off the rails, although full of interesting comments--but I'm VERY interested in the "Z-Wave Storm" part.

@bravenel @JasonJoel

This seems to very clearly align with what I'm seeing on my C-7 hub--any time there's more than a tiny number (like 1-5 devices) activated within any less-than-long time period, things simply don't work well (devices don't see their action request, the hub doesn't get the current state back, etc.).

I don't have a "Zniffer"--and it seems less that exciting unless I want to be a full-on Z-Wave developer (I'm way too busy with coding tasks at work for this now!). Thus, I can't actually see the super deep details of what's going on in the mesh.

But, given the number who DO have them, has there been ANY progress in sorting out what might be causing these storms? I keep my fingers crossed that the HE staff will look into and fix this, because this is THE single biggest issue I have right now.

Aside from that, I am incredibly stoked about my C-7 hub and all the cool things it can do.

You know, it is hard to tell. I will say that on my production and development c7 hub I have not been seeing easily reproducible storms. At least not on any kind of regular frequency that makes it easy to troubleshoot. And when I do see them now they are typically short-lived, which also makes it hard to troubleshoot from my end.

I did see them more often (and they didn't always recover on their own) before a release or two ago, though. My guess is that habitat continues to improve the radio or application layer interfacing, which makes it a little better each release. Hard to know for sure though as those type of changes don't make the release notes.

I do believe that they're continuously working on improving radio* stability though, so I expect it will continue to improve over time with future releases.

  • And before another pedantic person like me chimes in, by radio I really mean radio, application, queuing etc. The whole data chain.
4 Likes

Hope there working on c5 stability in parallel

Well. I excluded and re-included nearly all my S2 devices with no authentication and tied the "new" device numbers into all my rules. About 10-12 hour later, I guess I'll see how things work. :slight_smile:

I took out most of the extraneous refreshes and delays--so I can see how things do normally.

My "Group" actions certainly seemed more stable this way.

(My door locks are S0 or S2 Access Control; One light is S2 Auth, my Ring G2 "contact sensors" are still S2 Auth--I think that's about it).

How are things looking... i guess i might need to follow.

A bit mixed.

Overall, I believe things are markedly improved, but not perfect.

My "group actions" seem to be generally much more successful (although I noticed a few things kicked off during my "wake up and turn things on" scripts that didn't quite work). I think I had one "hub to device" action that failed when I went to bed, then got up, then went to "work mode" and I suspect that there were 2 things that didn't work properly because the hub never received the status update from the devices.

So, yes, I think going to unauthenticated was a notable help--but I may have a bit of additional tweaking to do.

Note: I've got some bodacious RM scripts, so that may be part of the issue. I'm trying to also reduce the number of script actions that are trying to take place simultaneously. I'm shooting in the dark a bit, though, because I don't have tools to point me to what is happening (a bit of trial and error, with some decent progress).

Note 2: When I re-included all those devices, I also REMOVED a number of "refresh" and repeated actions that had been necessary before. I am sure that things are working better than before--because they are working better now without those actions than before.

One big help on my "I'm home, turn things on and such" actions: I was looking through the logs in gory detail and made some significant adjustments to the order I was doing things and added some delays to keep the various actions from all firing at the same time. That was a "busy time" for triggers and events, so I tried to streamline the actions (garage doors open, entry door unlocks, entry door opens, garage doors close, entry door closes, system disarms, lights are all coming on, etc.).

2 Likes

Good to hear. I’m probably reverting back to ST. It was never this slow and now that I’ve switched everything to zwaveplus it should be even better.

1 Like

:frowning:

I would say wait until 2.2.4 and then see if things still are broken. There are supposedly some large changes and improvements with this upcoming update.

I don't know what the timeline looks like for this update (and I have no inside knowledge) but I suspect it will be out in a couple weeks just going by the past history of updates.

2 Likes

This morning I presumably had a storm after at least a week of no issues other than devices randomly dropping to 9.6kbps. Many FAILED or NOT_RESPONDING devices. Battery powered motion sensors continuously triggering according to logs. I restarted and repaired individual devices. Working OK again. C7.

Ok thanks for this. It as least gives me hope.

I will try and optimize my repeaters more this weekend but i'm really having a hard time believing this is still a "weak mesh" issue.

Have you seen this?

4 Likes