In need of suggestions

Tony · July 1, 2022, 6:47pm

That's the thing... aside from a route discovered by explorer frames (which gets set as the LWR and used until it fails after the requisite number of retries), a Z-Wave device itself isn't capable of building any routes; it just maintains a list of its in-range neighbors, updates it when it has been directed to look for them, and forwards that neighbor list to the hub (when commanded). It then depends on the hub to calculate and transmit back to it a set of viable routes which it then stores.

The dependency on the hub to do all routing calculations no doubt stems from the 8-bit 8051 memory constrained hardware used in all Z-Wave devices prior to the 32-bit ARM Cortex SoC's of the 700/800 series devices (yet those devices also depend on the hub for calculated routes).

AFAIK, a relocated Z-Wave device won't ever look for new neighbors unless directed to do so (by a broadcast neighbor discovery command as would happen during repair or some kind of scheduled process). So it will do what it always does-- use the current LWR, retrying it if it fails, then try the saved next-to-last working route, followed by direct broadcast to any node that might be in range, as well as cycling through the previously stored calculated routes sent by the hub. After all that, it resorts to an explorer search once it has exhausted all of those. Should that explorer search succeed, it will use that route until it no longer works. If the route works (even if it always takes two or three retries to do so) it will continue to be used without any further explorer searches. It will continue to hang on to a working route (no matter how convoluted) as long as it works (even if retries are necessary).

Certainly seems plausible that the Ack timeouts you've seen would be responsible for the repetitive explorer frame flooding scenarios (with resulting poor performance). The SiLabs videos mention (more than once, as I recall) that as the SDK has evolved, changes have been made to improve routing performance (early SDK levels didn't save LWR's or implement explorers), albeit with concessions to maintain backwards compatibility.

Reading between the lines it's pretty clear that some awful performance scenarios can result when the hub doesn't have an accurate node adjacency table (non-optimal routes will then get generated; leading to lots of retries, timeouts, and route changes) or there are explorer frame searches going on (as they saturate the mesh, any 'regular' Z-Wave traffic happening at the same time increases the probability of non-optimal explorer generated routes).

It's been a while since I watched those SiLabs videos; a post I made a while back (Where do Z-Wave packets go to die? - #7 by Tony )has some additional info.