In need of suggestions

Actually, a Z-Wave device doesn't really learn over time; this much has been stated explicitly by Silicon Labs in their tutorial videos. Lots of good info here:

https://www.silabs.com/support/training/z-wave-technology-training/mesh-performance There are no statistical learning processes going on in each node.

What might appear as learning is actually message delivery failure followed by multiple retries of direct and pre-calculated, stored alternate routes (sent to each device by the hub when the node is initially included, during repair, or via a scheduled neighbor update process).

Should retries of all pre-calculated routes also fail, the device will resort to explorer frame flooding which may take up to 30 seconds, saturating the mesh in the process. It's the last resort used after all the stored routes have been tried in succession multiple times without success. If explorer frames succeed in discovering a working route, that route is used until it fails, which case the process repeats again trying pre-calculated routes in succession.

The Z-Wave controller in the hub is the only device in the mesh that knows, or assumes that it knows, all current node adjacencies (and RSSI data, if available) based on what the nodes themselves report as their nearest neighbors when directed to do so by repair or when the node is included in the mesh.

Zigbee is a totally different kettle of fish; a Zigbee mesh can reorganize itself if need be before any actual device transmission has failed (Zigbee routers communicate amongst themselves constantly, exchanging status even when no actual messages are being transmitted).

7 Likes

Very helpful info @Tony! I've definiately conflated Zwave and Zigbee mesh rules/logic over the years so this is good to put things back into their categories.

Curious, does the hub keep track of how many times (or at least the last time) a device had to fail back to explorer frames? Are those type stats available anywhere?

1 Like

Not sure if this is it but it certainly a indication of things struggling and changing. If you look at the z-wave table in z-wave details there is "route changes". If that is high then it means the device is moving around to get the messages through.

2 Likes

^^^ Good advice; it's probably impossible to provide an environment where the RF environment is completely stable (that's why alternate paths exist) but the number of route changes should correlate with the number of failures and retries initiated by a node. A low number of route changes equals a high probability of success (within the set number of retries ) on a given path.

I don't think there is a way to tell if an LWR is the result of explorer frame discovery (depending on the SDK level, some early Z-Wave devices don't support explorer frames at all). The lengthy amount of time to establish a path should be a dead giveaway that explorer frame flooding is happening. One of the charts in the the Z-Wave tutorials provides a table of timeouts that shows how expensive
(performance wise) this fallback method is.

1 Like

If a device goes into this state is it then rebuilding all its routes, primary and backup? So lets say you move a battery device across the mesh and it cannot get a route, would it in theory rebuild its neighbors and routes when it sends out the explorer frames? This would make sense, and in that case it should be pretty rare a device has to do this unless it has very few neighbors with a spotty signal.

I have also witnessed via zniffer some devices that have had issues not getting Acks back fast enough (possibly not waiting long enough) and then go into this explorer mode constantly, even though they logically should have plenty of route options. These are the mesh killer devices. Wondering if resetting and re-pairing those in their current location would force them to rebuild it all from scratch? I have seen it recommended before but I honestly don't think it will make a difference and in these cases the device firmware might be to blame.

1 Like

That's the thing... aside from a route discovered by explorer frames (which gets set as the LWR and used until it fails after the requisite number of retries), a Z-Wave device itself isn't capable of building any routes; it just maintains a list of its in-range neighbors, updates it when it has been directed to look for them, and forwards that neighbor list to the hub (when commanded). It then depends on the hub to calculate and transmit back to it a set of viable routes which it then stores.

The dependency on the hub to do all routing calculations no doubt stems from the 8-bit 8051 memory constrained hardware used in all Z-Wave devices prior to the 32-bit ARM Cortex SoC's of the 700/800 series devices (yet those devices also depend on the hub for calculated routes).

AFAIK, a relocated Z-Wave device won't ever look for new neighbors unless directed to do so (by a broadcast neighbor discovery command as would happen during repair or some kind of scheduled process). So it will do what it always does-- use the current LWR, retrying it if it fails, then try the saved next-to-last working route, followed by direct broadcast to any node that might be in range, as well as cycling through the previously stored calculated routes sent by the hub. After all that, it resorts to an explorer search once it has exhausted all of those. Should that explorer search succeed, it will use that route until it no longer works. If the route works (even if it always takes two or three retries to do so) it will continue to be used without any further explorer searches. It will continue to hang on to a working route (no matter how convoluted) as long as it works (even if retries are necessary).

Certainly seems plausible that the Ack timeouts you've seen would be responsible for the repetitive explorer frame flooding scenarios (with resulting poor performance). The SiLabs videos mention (more than once, as I recall) that as the SDK has evolved, changes have been made to improve routing performance (early SDK levels didn't save LWR's or implement explorers), albeit with concessions to maintain backwards compatibility.

Reading between the lines it's pretty clear that some awful performance scenarios can result when the hub doesn't have an accurate node adjacency table (non-optimal routes will then get generated; leading to lots of retries, timeouts, and route changes) or there are explorer frame searches going on (as they saturate the mesh, any 'regular' Z-Wave traffic happening at the same time increases the probability of non-optimal explorer generated routes).

It's been a while since I watched those SiLabs videos; a post I made a while back (Where do Z-Wave packets go to die? - #7 by Tony )has some additional info.

4 Likes

So as to ones persons mention, starting from scratch does not help its been 4 weeks since I gave up and thought starting from scratch, excluding and including working from hub out. Well 2 weeks of settling my motion sensors are worse then ever. I have a motion sensor in line of site from hub maybe 8feet apart and sometime it triggers sometimes not. Other sensors suffer from same issue. My most reliable sensor ( Have 6 of same one) is furthest from the hub on another floor. In Tonys response when I do a complete repair its done on about 4 min which cant give each device to try every node option for every device to get the best route. 1 switch has to try 49 other switches which may hop and then do it all over again for the original switch. I think a process of optimizing every switch is a good idea to get the best mesh network but it would take hours. It would be beneficial for someone to build a 50 device network and then z-wave/hubitat would start pinging 1 device at a time and work its way out. If it sees to many hops then does a repeat. Tony dont know his experience but seems to know his stufff. So im still confused lol. So I guess heal does not work and from a full reset that didnt fix my issues. And so we continue with all of tonys new info.

What motion sensors? Battery powered ones have cool down periods where they won't retrigger motion, sometimes that's several minutes.

A lot of people use ZigBee motion sensors as they seem to be quicker to respond. I have both.

1 Like

Don't you mean send an "inactive" event? I think there is a difference?

1 Like

That's why I wanted the specific failure cases. For instance, he may be seeing that he's walking into a room, the motion triggers. Then he manually turns the lights off, but expects new motion within the cooldown to retrigger the lights...

Now that you've redone it, let's see a copy of your current z-wave details page.

Yeah, that's an easy mistake to make for sure.

2 Likes

Im fully aware of the 4min cool down unless I set it to test mode that poll every 10 seconds at the cost of battery life. A example is this morning ( I do not have any restrictions on any motion sensors as Im still testing) I walk through downstairs hallway and light did not go on, few moments I walked back past it and the light did turn on. This is common with almost all my motion sensors. Either they turn on instantly or with lag the first time, or they wont till a second pass. (second time seems to trigger instantly) I walked into laundry room and stood there, watched the green light flash, but lights did not go on, grabbed something out of garage, barley opened door and instantly light was on. Had someone popular in the community rdp into my system made some changes. Looked it over and for most part its all setup up correctly. He added a room lighting app and all switches work but the slaves also lag. Im open for someone else to rdp and take a look to see if any other issues. Id buy you dinner, and will save time with posts. If any other issues are found that cannot be fixed or are unexplained to let me or you upload the specific info for people to understand and use to to decipher the issue. With all the people for the most part having near perfect installs, and me having knowledge of x-10 and insteon. z-wave is either is not liking me or something installed, or im just blind to what im doing. Suggesting the rdp this way you would have access to entire setup rather then bits and pieces. Sometimes another set of eyes does not hurt where as the person that helped is very skilled. If anyone interested message me please, THANKS

I'm assuming this green light was on the motion sensor, so showing that it sent the command?

If above is correct then this would sound like it almost created the route on the 1st try. Then used it on the second?

Or maybe there is some configuration of the sensors that need tweaking for sensitivity? What model sensor do you have maybe it's something that could be looked at.

Just to rule out other stuff when you control the switch which drives the load does it have a lag, especially if it's been off for a while.

1 Like

The green light is when it sense motion. 3 examples the light in game room is same room the hub is in it turns on in les then .5 seconds without fail. Downstairs hallway motion is in direct line of site of hub and about 10 feet away but rarely turns on light. laundry room motion is 15 feet away from hub and is the one takes 2 attempts to turn on. Lastly my furthest upstairs bathroom motion also works every time in less then .5 seconds but it has to go through the floor or another switch. The house is set up where there in not more then 10 feet from another repeater switch. Upstairs within 30sq feet 14 switches plus a stand alone extender and downstairs in 30sq feet also 14 repeater switches. So I think my house is well covered and no motion sensor is more then 6 feet away from a powered switch. All my leak sensor work just fine in same room as motion sensors. I have 7 ecolink z-wave2.5-eco. The pet sensor is turned off and it is set to the default 4 min rearm. So I am aware that if it turns on and I turn off the light at switch it wont turn on again until the 4min timer run out. But getting some of these to actually turn on is a issue. Side note it seems to work better if there was some light first, not pitch black. Also it does seem like it pre stages and waits for a second command then its ready to go. All sensors are set +up same way in basic rules. For now no restrictions no programs except you see motion turn this light on once motion is inactive turn off. Very basic and annoying because future wife complains im wasting electricity. I told her im in test phase to get all devices working then ill add times etc.

Where have you got this from? The motion has no way of knowing what state the light is in. There is normally a cool down period but this is so it doesn't go ON off ON off ON off ON when you walk under it. It's normally less than 1 min about 30 seconds is about right. Some systems this can be a lot longer like the one I work with for my job. But it's not good for tracking presence. Where as these stuff normally are.

So then the app also has a timer, so when it activates light should turn on. In basic rules of you turn off the light the app doesn't know what has happened so thinks the light is still on. So in this case (depending on how it's set up in the app) you have to wait for the motion to go inactive and stay inactive for what ever you have set as the wait time in the basic app. Then the app will send a off and it competes so you can start again. There is a set up in basic rules which would actually keep turning the lights on.

In room lighting it does the same, but it can be told when you have turned off the light. So unlike the more basic apps it could be controlled as to work either way. It could have a manual time out aswell as a auto time our.

Still need to know this though, there is something strange going on and we need all the information to assist.

1 Like

Looks like there may need some configuration changes to the device. Not sure if this would have happened on paring? @bcopeland could you assist? I believe it's the basic motion sensor driver used.

These sensors are painfully slow.. I have 2 of them.. They won’t be good for much beyond security purposes…

3 Likes

not true .. i use them and they are fine.. just make sure you set the off time in your rules longer than the timeout.. i think i have mine at 4 or 5 minutes. but yes if you want quick instant response get the nyce zigbee ones.. i use those in the bathrooms with added benefit of temp.

1 Like

I was reading their docs and it suggests they are extremely sleepy. It would explain why 1st pass they don't work then next pass they do.

Which is? I can't find it in their documents.