Automated Cleaning of ZWave "Tortured Routes"

I wanted to write a rule or build an app that would look at the ZWave details table and if the route for a specific node was more than a direct hop to the hub that I would initiate a rebuild route command for that node. I was planning to run this routine in the wee hours.

I have tried a rebuild network command using the endpoint ( Full Z-Wave Repair: http://<your-hub-ip>/hub/zwaveRepair) but the system only logs the start of the rebuild but doesn't log the completion so it is a vague result.

Apparently the number of hops detail isn't exposed in any API so I would have to do it for all nodes or look at the table and setup up a list of nodes manually. Neither is too great an option.

Since the data exists in the table wondering if there is a workaround for me to parse this info or some API hook I could use to get at the number of hops for each node?

Any ideas?

Why do you want to do this? Z-Wave is designed to have hops if needed. Are you having issues?

1 Like

Yes "if needed". I am suspicious of the way hops are spawned and multi hop routes are generated. On one hub I have these 3-4 hop routes for what should be a direct hop to the hub. When I do a manual rebuild, they often revert to a direct hop. So it seems the route algorithms do not have a bias to migrate back to the fewest hops. They drift into chains of hops but don't seem to make an effort to retrace themselves.

Excessive hops lead to excess traffic and ZWave radio jamming when large actions occur, like an "All On" or "All Off" rule. So having lots of hops puts more strain on the mesh. If the only way to a node is via hops then so be it. It just seems hops are prescribed when they needn't be.

So after your get them changed, do they stay that way?

Some do some don't. This isn't a static model, its dynamic. Ever changing. The mesh is fluid and in constant change. It just shouldn't change to unnecessary hops, or if it does, it should have a bias to retrace itself if the path metrics permit.

Not sure, but I think you proved my point. It is what it is, and will be (unless you use LR.)

4 Likes

Just confirming, you've compared performance with direct routes and with routes with hops, and there is a consistent, significant difference?

2 Likes

This 'Tortured Routes' thread reminded me of one of my newbie posts from more than 5 years ago. I think the conclusion to it was that the Z wave routes don't necessarily make sense....

3 Likes

I have some theories about the routing with heaps of hops.

Namely, I suspect that those crop up with there are unrelated delays in sending "ACKs" back to the sender of a ZWave packet.

If, say, the hub radio is bound up and doesn't reply with an ACK to a packet in a timely manner, the device times out and thinks it needs to find another way back, thus trying routes with various hops. It may even try several if they keep failing. Finally, the hub gets unbusy and replies with an ACK--and it just so happened that was on a circuitous route that went back and forth all over the building. (or, perhaps, the device was the side that was locked up).

Something like that is about the only reason I could imagine creating some of the insane routes I've seen crop up from time to time.

Thank heavens for ZWave LR that ends the insanity. :slight_smile:

As for fixing routes, I'd exercise caution as that's certainly an area where you might end up with a worse cure than the problem.

1 Like

That is an interesting possibility. In a prior life I worked in Telecoms and we had to deal with "signaling storms" when network errors would create massive amounts of error reporting from several elements all at once. We had to develop backoff and grooming schemes to avoid flooding the signaling layer of the network. In the nominal case everything worked fine, but fail out a network element or break a link (during a busy time) and all hell could break out.

I have these All Off and All On rules that run a few times a day (mostly All Off that run at night as housekeeping routines). They definitely flood the network with 50-55 devices with Offs. This could be the issue with the device not getting its ack and search for a plan B (a multi hop alternative). I also see 3-6 Z-Wave jammed messages a day.

I don't have a problem with the migration to these multi hop routes. My issues is the apparent lack of a behind the scenes pro-active measure to search for least hop alternatives to migrate back to. The system can drift to these max hop scenarios but there should be a counter measure to gracefully wind back to fewer hops if possible. I don't see it even attempting to do this. The algorithm seems to be missing and the system accepts a path scheme that works "as good enough".

The problem with "good enough" is that it adds RTT, (doubles just by adding one hop), and the extra traffic congests the ZWave channel further creating ever more collisions and havoc (downward spiral). So when I do want to run my All On command it can take 40-70 seconds to complete.

So my original point was to see if I could automate reeling in these rouge routes with a targeted rebuild on them in off peak time maybe that would be the missing counter measure to the meandering spread of tortured routes. When I do it manually at least 60-70% of the rogue routes repair to a direct hop. Not 100% but less is better.

I guess I could do a full ZWave repair every night (and that seems to trigger the keyboard warriors as well) but I was hoping for something more targeted. The only way to be targeted is to trigger the repairs on the non direct hop routes and exposing that data to an API doesn't exist and screen scraping has fallen short. The repair per node is available by an endpoint but not a way to parse multi hops or large RTTs or high PERs as criteria to target the repair.

hubitrep has a new “Hub Diagnostics” app that shows (among many other things) the Zwave device routing.
So maybe there IS an API.

1 Like

I'm not sure why you would regularly run an All On command for 50 to 55 lights, but for running all off, there is a setting in rule machine where you can tell it to only try to turn off switches that are reporting on. That would significantly reduce the load that the all off rule is putting on your hub.

"Command only switches that are on."

1 Like

Yes I was using that but I shifted over to Groups & Scenes App where I setup and All On/All Off group. I trigger it with a pair of virtual switches. In G&S App I have optimization enabled (targets only devices who need the state change) and metering enabled with various msec delays (I am at 150 msec right now).

I was told this app was more efficient for such a bursty tasks like I am doing. RM apparently has more overhead. This app seems to work a little better than the prior RM rule I had but still far from perfect.

I have had this All On All Off capability for 15+ years. Prior to moving to Hubitat, I had it on a vintage Vera Plus hub and before that a (now antique) Vera-lite. Sadly the All On/Off scenes worked much better on Vera products than Hubitat. I don't know the reason why but I suspect Vera had to crawl into the ZWave stack in the old days whereas now it is treated as a black box given the push for encapsulation. That's not a bad thing but it seems to promote a belief system that the ZWave subsystem is just fine, beyond reproach. I don't buy that. A 5 hop node @ 40kbps to a hub when a manual rebuild magically returns it to a direct connection and 100kbps link suggest the system is overly tolerant of poor routes. They might have been needed for some legit reason for a brief period of time but they should have a bias to wander back to least hop options. I don't think they ever try to wander back or they don't have enough bias to force it more urgently.

None of this would matter if ZWave multicast was implemented. I believe it is supported in Zigbee but I have already spent >$1,000.00 on ZWave devices and not about to throw that all away. I like 900MHz better than 2.4 anyway for in building coverage. I guess pre 700 series chip sets don't support the multicast hooks so it would be limited to newer devices anyway. The complexity of managing blended fleets (pre 700 and post 700 chipsets) of devices might make this not a hill to die on for the platform team. I am not hopeful we'll see ZWave multicast anytime soon.

I plan to keep pecking away at this one unit I either run out of steam or find something more definitive. It kind of funny all the comments along the lines; why would you do that, that's how mesh networks work, you don't understand, ... no one has said, hmmm a 5 hop 40kbps route the rebuilds instantly to a direct hop @ 100kbps that does seem a little odd, why would it drift so far out of touch? Not one...

1 Like

My understanding (and I'm not an expert) is the Zwave controller just changes routes when it's forced to, aka, when it "can't get to a destination" or no existing route is defined. (then Explorer Frames are broadcast to find any working route) - This stuff is not nearly as sophisticated as OSPF or EIGRP, with costs associated on the hops. Once it finds a working route (LWR), in ZW+ the controller basically sticks with that, as it tries the LWR first (and any burst of interference/noise might force it from an obvious/optimal route), it just doesn't dynamically change to lower cost routes, once it finds a working path - It's also a proprietary routing approach by SiLabs, so it's not like the internal routing scheme is well documented - But the best description, I could find, was section #4 of this article

There are some other limited docs:

Finally, there is a concept of ZW application priority routes (APR), that can "force" a route, but those are obviously not self-healing, nor supported by HE (to my knowledge).

Bottom line ZW LR is likely your best bet to "force a route" and keep it to a single hop (and it also supports higher power levels that the standard mesh, as I understand it)

Totally agree - And given that it's done in ZB (See the GroupOn/GroupOff commands) it does seem to be a missing feature in HE ZW and Matter APIs

As I understand it - the older ZW ZIP gateway does NOT seem to support ZW Multicast. But, that all said, the newer ZwaveJS interface DOES support multicast (there are apparently limits around S0 and S2 security) - But I think RL and Groups/Scenes were both written before ZwaveJS was available (so this all may be a historical/chronological artifact of what was available at the time)

So it would be great if @bcopeland could expand the HE Zwave APIs (and Matter APIs as well) to support Multicast - Then I think it could be integrated into G&S and/or RL (more likely).

1 Like