In need of suggestions

Must be a typo of some sort z-wave have max 4 so just can't do anymore than that.

Don't worry to much about the randomness of the mesh as long as it's reporting your good, it settles eventually.

Humm shouldn't be laggy. Now you have done some ghost busting may want to see your tables again.

1 Like

Don't stress over the routes shown. I too used to question and be frustrated by this but you learn to not worry about the odd routes so much. Look at the device connection speeds reported and the number of re-routes for devices to worry about. Note that your mesh will settle over time as well. Days, weeks or even months - patience is key. Each device makes its own periodic determination about what it thinks is the "best" route back to the hub. Some of that is RF black magic as I too have devices that are physically in the same bank of switches yet report a completely nutso route when its sister switch is directly connected to the hub. Does it see a stronger signal from a physically distant switch vs the switch directly next to it (or the hub which is equa-distant from the sister switch as well) - WHO KNOWS.

I've not done the antenna mod to my C7's yet but those guys all report a much higher number of directly connected devices with their hub before and after. At some point I'll look into that as well.

2 Likes

settings>>z-wave details

1 Like

Im a mechanic a antenna mod would be easy. Where do I get it. Once replacement is done do I have to reset everything and start over to get best route or just wait till it hopefully fixes itself. Why does z-wave take so long to heal? Why cant is start with one device and ping others, know their strengths and move on to next one. Once done it would do a final optimized run to finish off. Then run a heal mode where it can fine tune devices over time. Maybe Im thinking of this from a mechanic's stand point of tracing wires that branch out. You start at fuse and check voltage at each end point. Repair the one that has low voltage. My theory most likely not z-wave pratice is when you do a repair . The software should know signal strength of each device up front. Decide which one have a above average strength to hub. Then try various routes to fix ones with low strength. Maybe this is exactly what z-wave does but not certain why this cannot be achieved over night . Rather in worst case month. A rather simple answer would be fine to all these questions. Im just used to insteon where you can add 75 switches and they all work right away. Im stuck on insteon protocol with speed and dont want to spend months setting this up to wrok properly. In my 30 years as a senior tech its get it done and it was due yesterday. I Love home automation but cant spend months setting it up. I have tons of other stuff to do. Lastly with my kind rant since i was not disrespectful is amazon only offers 30 days to return so if I have to wait for it to settle for 2 months and then community discovers 2 or 3 bad switches or not performing then I have to eat 150$. Thanks for listening.

You might have to take that up with Silabs as they are the ones who set the routing rules in the radio firmware - the latest update has definitely helped though. This will likely be true no matter what 700 series hub you use unfortunately. This is NOT really an HE issue.

One thing I've done in the past is have multiple hubs by location. This works well in distributing the load and devices. For a client of mine I set one up in the main house and the other in the detached garage (they had an ethernet run there). This was done because the distance was a little too great for things to work reliably. I also did this in my house for a while - had a hub in the basement and one on the 2nd floor. It worked great but probably not really necessary for my house.

Nowadays I have 3 hubs by "type" - C7 and 2 older C5's. The C7 is for Z-Wave devices, one C5 is for zigbee and the other one for network/lan devices and apps. I use HubMesh to communicate between them when I need to. Note: I did this to see how things would work and I had the hubs on hand.

1 Like

I wouldn't reset anything. Within a day or two your devices (especially powered ones) should all find their routes updating to the hub directly. The antenna thread discusses people's experiences.

As for the zwave "why can't it just?" questions, I would say that route finding via multiple hops on RF is not an easy problem to solve. Signal strength and interference vary constantly, so there is no good time to run a "analyze NOW" task to capture the state of the world as you might in a wired (comparatively interference free) environment.

I believe Insteon used a system where every node got the entire network "map" (so devices could talk directly vs having to use a central hub) plus they had powerline as a secondary transmission method. Zwave only captures "near" devices based on signal strength and tries its best to figure out the optimal route over time. Insteon also allowed each button controller to talk directly to any and all dimmers without going thru a hub. The rules and maps had to be duplicated everywhere but it did make communication quicker in cases like that. Zwave does have some options for device->device communication (using a concept called associations) that can help in some cases.

Personally I've not had anywhere near the issues with zwave devices as you've been sharing. I never repair my network, I've had maybe 1 device ever need power cycling after a storm, With no ghosts the 900mhz zwave network (especially of all zwave plus devices) has been rock solid for me on both ST and HE.

Also note, there are 2 types of zwave repair. An overall repair (run by "Repair Z-Wave" button at the top of Settings->Z-Wave Details) and device repair (run by the "repair" button on each device with a route.
The overall repair is documented here: Z-Wave Repair and here https://docs.hubitat.com/index.php?title=The_Anatomy_of_Z-Wave™_Repair
The device repair is mentioned in the docs as: " Repair will attempt a per-node repair (recommended over a full network Repair Z-Wave when possible)."

I personally have only run a full repair rarely and that was when adding several devices "mid mesh".

Would you share your current zwave device list again since you started over?
Also share your zwave topology map (Settings->Z-wave Details->Z-wave Topology).

I also think it would help to document the other failure cases you're seeing. As you mentioned in your mechanic troubleshooting analogy, lets go thru the failures or issues and see if we can find solutions. For instance, if you have a rule to turn on a bunch of switches on a motion sensor event, some people have found that once you reach 4 or more switches, they don't necessarily reliably all turn on. HE's "Groups and Scenes" have helped people with that, as have the zwave radio firmware upgrade. You may already be using those, but lets isolate the action reliability (turning on/off the group in my example) from the event reliability (motion sensing) for each of your failure/frustration cases.

4 Likes

I come from the Insteon world as well. We never had any mesh analysis tools for Insteon, so I for one never new how well the mesh was optimized. Keep in mind that Insteon also had repeaters, both RF and power line varieties.

I fought Z-Wave mesh issues for quite some time. Moved my hub around, added repeaters, etc. The more I screwed with it the worse it got. I left it alone for a few weeks, and amazingly it got better and better. It's my guess that doing a "repair" triggers something to update the mesh later, but doesn't actually do anything immediately. I've learned you just have to stop screwing with it for a while and see what happens.

I am a proponent of the Antenna Mod, but don't expect miracles. After I did mine, I was able to increase the number of 'Direct' devices quite a bit. Many people have said they saw a speed up in response times, though in my case that didn't happen or I just haven't noticed it.

While the Wave Mesh Details app is interesting, don't try to fix anything you see because chances are it will make things worse. There are probably just a small handful of engineers at Si Labs that understand the algorithm and the reasons why certain paths through the mesh look bizarre.

5 Likes

Actually, a Z-Wave device doesn't really learn over time; this much has been stated explicitly by Silicon Labs in their tutorial videos. Lots of good info here:

https://www.silabs.com/support/training/z-wave-technology-training/mesh-performance There are no statistical learning processes going on in each node.

What might appear as learning is actually message delivery failure followed by multiple retries of direct and pre-calculated, stored alternate routes (sent to each device by the hub when the node is initially included, during repair, or via a scheduled neighbor update process).

Should retries of all pre-calculated routes also fail, the device will resort to explorer frame flooding which may take up to 30 seconds, saturating the mesh in the process. It's the last resort used after all the stored routes have been tried in succession multiple times without success. If explorer frames succeed in discovering a working route, that route is used until it fails, which case the process repeats again trying pre-calculated routes in succession.

The Z-Wave controller in the hub is the only device in the mesh that knows, or assumes that it knows, all current node adjacencies (and RSSI data, if available) based on what the nodes themselves report as their nearest neighbors when directed to do so by repair or when the node is included in the mesh.

Zigbee is a totally different kettle of fish; a Zigbee mesh can reorganize itself if need be before any actual device transmission has failed (Zigbee routers communicate amongst themselves constantly, exchanging status even when no actual messages are being transmitted).

7 Likes

Very helpful info @Tony! I've definiately conflated Zwave and Zigbee mesh rules/logic over the years so this is good to put things back into their categories.

Curious, does the hub keep track of how many times (or at least the last time) a device had to fail back to explorer frames? Are those type stats available anywhere?

1 Like

Not sure if this is it but it certainly a indication of things struggling and changing. If you look at the z-wave table in z-wave details there is "route changes". If that is high then it means the device is moving around to get the messages through.

2 Likes

^^^ Good advice; it's probably impossible to provide an environment where the RF environment is completely stable (that's why alternate paths exist) but the number of route changes should correlate with the number of failures and retries initiated by a node. A low number of route changes equals a high probability of success (within the set number of retries ) on a given path.

I don't think there is a way to tell if an LWR is the result of explorer frame discovery (depending on the SDK level, some early Z-Wave devices don't support explorer frames at all). The lengthy amount of time to establish a path should be a dead giveaway that explorer frame flooding is happening. One of the charts in the the Z-Wave tutorials provides a table of timeouts that shows how expensive
(performance wise) this fallback method is.

1 Like

If a device goes into this state is it then rebuilding all its routes, primary and backup? So lets say you move a battery device across the mesh and it cannot get a route, would it in theory rebuild its neighbors and routes when it sends out the explorer frames? This would make sense, and in that case it should be pretty rare a device has to do this unless it has very few neighbors with a spotty signal.

I have also witnessed via zniffer some devices that have had issues not getting Acks back fast enough (possibly not waiting long enough) and then go into this explorer mode constantly, even though they logically should have plenty of route options. These are the mesh killer devices. Wondering if resetting and re-pairing those in their current location would force them to rebuild it all from scratch? I have seen it recommended before but I honestly don't think it will make a difference and in these cases the device firmware might be to blame.

1 Like

That's the thing... aside from a route discovered by explorer frames (which gets set as the LWR and used until it fails after the requisite number of retries), a Z-Wave device itself isn't capable of building any routes; it just maintains a list of its in-range neighbors, updates it when it has been directed to look for them, and forwards that neighbor list to the hub (when commanded). It then depends on the hub to calculate and transmit back to it a set of viable routes which it then stores.

The dependency on the hub to do all routing calculations no doubt stems from the 8-bit 8051 memory constrained hardware used in all Z-Wave devices prior to the 32-bit ARM Cortex SoC's of the 700/800 series devices (yet those devices also depend on the hub for calculated routes).

AFAIK, a relocated Z-Wave device won't ever look for new neighbors unless directed to do so (by a broadcast neighbor discovery command as would happen during repair or some kind of scheduled process). So it will do what it always does-- use the current LWR, retrying it if it fails, then try the saved next-to-last working route, followed by direct broadcast to any node that might be in range, as well as cycling through the previously stored calculated routes sent by the hub. After all that, it resorts to an explorer search once it has exhausted all of those. Should that explorer search succeed, it will use that route until it no longer works. If the route works (even if it always takes two or three retries to do so) it will continue to be used without any further explorer searches. It will continue to hang on to a working route (no matter how convoluted) as long as it works (even if retries are necessary).

Certainly seems plausible that the Ack timeouts you've seen would be responsible for the repetitive explorer frame flooding scenarios (with resulting poor performance). The SiLabs videos mention (more than once, as I recall) that as the SDK has evolved, changes have been made to improve routing performance (early SDK levels didn't save LWR's or implement explorers), albeit with concessions to maintain backwards compatibility.

Reading between the lines it's pretty clear that some awful performance scenarios can result when the hub doesn't have an accurate node adjacency table (non-optimal routes will then get generated; leading to lots of retries, timeouts, and route changes) or there are explorer frame searches going on (as they saturate the mesh, any 'regular' Z-Wave traffic happening at the same time increases the probability of non-optimal explorer generated routes).

It's been a while since I watched those SiLabs videos; a post I made a while back (Where do Z-Wave packets go to die? - #7 by Tony )has some additional info.

4 Likes

So as to ones persons mention, starting from scratch does not help its been 4 weeks since I gave up and thought starting from scratch, excluding and including working from hub out. Well 2 weeks of settling my motion sensors are worse then ever. I have a motion sensor in line of site from hub maybe 8feet apart and sometime it triggers sometimes not. Other sensors suffer from same issue. My most reliable sensor ( Have 6 of same one) is furthest from the hub on another floor. In Tonys response when I do a complete repair its done on about 4 min which cant give each device to try every node option for every device to get the best route. 1 switch has to try 49 other switches which may hop and then do it all over again for the original switch. I think a process of optimizing every switch is a good idea to get the best mesh network but it would take hours. It would be beneficial for someone to build a 50 device network and then z-wave/hubitat would start pinging 1 device at a time and work its way out. If it sees to many hops then does a repeat. Tony dont know his experience but seems to know his stufff. So im still confused lol. So I guess heal does not work and from a full reset that didnt fix my issues. And so we continue with all of tonys new info.

What motion sensors? Battery powered ones have cool down periods where they won't retrigger motion, sometimes that's several minutes.

A lot of people use ZigBee motion sensors as they seem to be quicker to respond. I have both.

1 Like

Don't you mean send an "inactive" event? I think there is a difference?

1 Like

That's why I wanted the specific failure cases. For instance, he may be seeing that he's walking into a room, the motion triggers. Then he manually turns the lights off, but expects new motion within the cooldown to retrigger the lights...

Now that you've redone it, let's see a copy of your current z-wave details page.

Yeah, that's an easy mistake to make for sure.

2 Likes

Im fully aware of the 4min cool down unless I set it to test mode that poll every 10 seconds at the cost of battery life. A example is this morning ( I do not have any restrictions on any motion sensors as Im still testing) I walk through downstairs hallway and light did not go on, few moments I walked back past it and the light did turn on. This is common with almost all my motion sensors. Either they turn on instantly or with lag the first time, or they wont till a second pass. (second time seems to trigger instantly) I walked into laundry room and stood there, watched the green light flash, but lights did not go on, grabbed something out of garage, barley opened door and instantly light was on. Had someone popular in the community rdp into my system made some changes. Looked it over and for most part its all setup up correctly. He added a room lighting app and all switches work but the slaves also lag. Im open for someone else to rdp and take a look to see if any other issues. Id buy you dinner, and will save time with posts. If any other issues are found that cannot be fixed or are unexplained to let me or you upload the specific info for people to understand and use to to decipher the issue. With all the people for the most part having near perfect installs, and me having knowledge of x-10 and insteon. z-wave is either is not liking me or something installed, or im just blind to what im doing. Suggesting the rdp this way you would have access to entire setup rather then bits and pieces. Sometimes another set of eyes does not hurt where as the person that helped is very skilled. If anyone interested message me please, THANKS