What I learnt from Zniffering, and lessons for the "Slowdown" Disease

First of all, I would like to thank @codahq for bringing Zniffering to our attention, and for allowing us (really for the first time) to "see" our zwave networks. If you have an extensive zwave network, and you're having issues, please see his post which I've inserted at the bottom of this post.
I'm writing this post to help other Hubitaters who may read this at some point in time in the future.

  1. A few (3-5) years ago, when I first got started in Home Automation, and I was convinced that z-wave was the answer (days of Vera, HomeAssistant, etc.), I thought that it was a good idea to build out my zwave network to the furthest reaches of my house. I reasoned that no matter where I would put my zwave devices (especially un-powered sensors), I would be covered by a good mesh. So, I purchased quite a few zwave outlets and scattered them around my home. Zniffering allowed me to see that my newer zwave plus devices had such a great range that I didn't need any of those zwave outlets. Furthermore, some of those outlets were actually slowing down my zwave communications. In a burst of energy (aided by a forced isolation), I replaced 6 of them in one day. I must have run at least 3 (maybe 4) zwave repairs that day.
  2. Lo and behold - I have a small zwave simple lighting automation, which I run to see how my zwave is doing. I count the number of seconds for it to execute, which is usually 1-2 seconds. The day that I did all of those zwave repairs, that automation took 17 seconds. Hmmm... I had just rebuilt the mesh tables with a zwave repair - why did it take so long? Could it be that each zwave repair exhausts some part of some resource, so that after a number of zwave repairs everything runs slow? I really can't say for sure, but I can say that next day after a reboot, that same automation ran in under 1 second.

Unfortunately, I don't know why we have the "Slowdown" disease. I know that if you have it, unfortunately you have to investigate and investigate, and investigate. Maybe you will eventually find the right antibody.

1 Like

Ahem... maybe the "reboot" helped clean out the hub.

I might add that you should use a secondary controller like the PC Controller to perform Z-Wave tests, validations, speed testing to get the direct reaction/response timing to remove HE from the equation when wanting to test Z-Wave Network performance. Only then can you say "Aha it's a Z-Wave issue"... or not.

As a matter of fact, the same day that the UZB arrived (for zniffering), I decided to try out PC Controller. That day I found (and removed) 4 ghost nodes.
I really can't say if removing those nodes impacts performance of the zwave mesh. Maybe yes, maybe no. Regardless, it's an example of a tool that can be very helpful in "understanding" your zwave mesh.
Perhaps the reason that these tools are now more readily available is because Zigbee is gaining more traction in the marketplace. I don't know, but I'm glad that they (Silicon Labs) made it available (even if I don't understand everything in it!).

The tools have existed for a long time. Access to them has not always been as easy as it is today or cheap. To get a full zniffer in the past you would have to hack one together or buy the development kit which was insanely expensive. Today the dev kit is quite reasonable or you can as you have and many others just flash a usb stick and download the software. :smile:

The PC Controller software used to be distributed from different vendors and it was the Zensys tool. Aeotec used to distribute it too.

Other platforms have the basic tools built into their gateway's or provide add-ons to make this easier and provide documentation.

Zigbee has been around longer than most know of. It's "recent" fame has come from the DIY market from light bulbs and super cheap zigbee devices. It's a great protocol and I expect to see a lot of convergence and co-existence from SiLabs in the near future since SiLabs is heavily involved in both protocols and chip manufacturing....

1 Like

When a Z-Wave repair runs, it uses 100% of the Z-wave bandwidth to process the changes. So yes, that will make it very slow.

Don’t run too many repairs. Give it time to settle.

2 Likes

Ok I keep seeing this over and over and over again.... what are people talking about "let it settle" there's NOTHING to settle.

Z-Wave repair performs a neighbor discovery and generates a routing table and then updates that information to the client node. It does this one at a time. Once the route information is written. It's written. Done. If the routing information is correct there's nothing else to wait for.

NOW if the repair process is flaky/faulty and doesn't update the routes correctly or the node correctly and causes a bad route effectively putting a hole in the mesh then yeah you have to then WAIT for Explorer Frames to repair that hole. This is only necessary when the controller fails to repair/update the routes correctly.

2 Likes

My guess would be: since an answer to that is a 'skill' it's simpler to presume that Explorer Frames are required and it's busy doing that. So wait.. wait for 'settling' Which is a bit better than waiting for it to 'gel' :smiley:

1 Like

What I specifically tried to imply, (but couldn't verify) is that
the Zwave repair process itself, causes some sort or resource leak,
so that under some circumstances, the more you run that repair, the more of a slowdown you will get.
Please note - this is what happened to me... with my unique set of circumstances - your mileage may vary.

Yes to both. Trying to balance over explanation of every detail with a “Keep It Simple” approach. You can only please some of the people, some of the time. :wink:

No guessing needed. The Z-wave repair process itself is going to make the Z-Wave network unresponsive while it’s executing.

1 Like

Within HE there's no way to know this or even determine it. What I CAN verify is that the heal process from HE is FAR SLOWER than it is from ANY of my other systems. No I have not setup the ST hub to test against :slight_smile:

No. That's not what I pointed out. You've misunderstood.

After many zwave repairs were done and over with (by many hours), the zwave commands were very slow. Not while it was running!

Now, of course, no one should run multiple zwave repairs during the day.
Furthermore, I tried to duplicate the problem, and was unable to do so.
Perhaps it was something else that happened that day, I just do not know, but I have formed a reasonable speculation.

I see. Correct, I misunderstood. Can’t be the result of the repair process itself, but could definitely be that one of the nodes in the new route selected after rebuilding the neighbor tables is causing the delay.

Have you got a way the check response time to each node?

Yes, with PC Controller and Zniffering I have checked the speed between nodes.
All I remember that day (it was a very tumultuous day), is that the speed between many nodes was very slow.

Perhaps someone who has an extensive zwave configuration, could run the following test (one day)...

  1. Record the speed between some nodes (or perhaps Hub WatchDog...)
  2. Run multiple zwave repairs during the day (perhaps at least 3, in order to exaggerate the issue)
    3, Record again the speed between those very same nodes
  3. Reboot/restart
  4. Record again the speed between those very same nodes

Again, let me stress that I cannot be certain that running multiple zwave repairs causes a slowdown. However, it certainly is suspicious...

Keep in mind that if your mesh is mostly Z-Wave Plus Explorer Frames CAN and WILL update the routes. It's not recommended to be running a heal over and over again and can in fact screw up the routes.

I specifically stated that I do not recommend such an activity
(running repair that many times in one day). As stated it was a tumultuous day.

However, as a test it is worth considering, is it not?

Allow me to defer to those Hubitaters who have a much greater knowledge and experience than I. If you all think that this is not a reasonable or possible speculation, I will withdraw my point.

What's the speculation?

That there is some sort of resource leak associated with the zwave repair process.

That’s an interesting thought. My slowdowns reduced after turning off the z-wave poller. I wonder if the z-wave section of code might have a resource leak... Radio controller, interface, etc...

Nice work jtmpush18.
Great to get others perspective with respect to problem solving their Home Automation issues.
Thanks for your info.