Why does Hubitat show missing devices with neighbors and routes?

¿Just to be clear, it sounds like you are speaking of Zwave devices?

2 Likes

If it's z-wave then it's simply voodoo. A lot of z-wave stuff won't route how you think it will. In general that's fine unless you're device is getting an inordinate amount of route changes then a beaming repeater is in order. You can try doing a repair on an individual node and see if that helps, but really, z-wave is voodoo programming. (This is because of the sdk, not because of Hubitat or any other z-wave hub)

2 Likes

Yes, Z-wave. I should have clarified that.

I get that you can not easily predict or specify routes but I was trying to understand why Hubitat is showing working devices using another device, that is not actually powered up but also indicates it has neighbors, in it's routing path. I see this after repairs and reboots.

I am having lots of issues with devices not responding on the first try or sometimes not at all. I have lots of devices overall and some devices that are close by to other devices, that are not even showing up as a neighbor even though they are basically line of site away in the same room. Or devices that are showing off in Hubitat but are actually on and vice versa.

I have tried to run repair and often get many devices showing up as Failed afterwards, even though they still work. I feel like Hubitat is not at all in sync and displaying correctly what is actually happening on my network. I am really not sure what more I can do to fix it and it is truly failing the WAF. I am getting tired of buying more and more devices trying to improve things when nothing is really getting better.

How long are you waiting for the network (hub) to rebuild the routing tables? In my experience it's not something that a reboot will force. Also can you post a screen shot of your zwave details page?

2 Likes

What you're describing does sound like a legitimate problem... if what is being reported on that page is intended to be a current copy of the controller's view of the mesh, it would contain only functional nodes. If not, a Z-Wave repair (and removal of ghost nodes) should rectify that situation.

Putting that question aside for the moment, the next question is how could a functioning device actually work if it's using a route that includes non-functional or non-existent nodes? That one may be easier to answer.

Let's say a route is built that includes repeaters in the path A-B-C-D; for sake of simplicity these are the only nodes that ever existed in the network. Supposedly when the controller generated this route, it was deemed to be the 'correct' order of transmission through a mesh containing those nodes. Now, if device 'B' disappears (fails or somehow is otherwise deleted) according to the spec, the route will continue to be used until it has been retried multiple times and repeatedly fails.

A Z-Wave repeater doesn't contain any routing table, it doesn't inspect an incoming frame for invalid nodes-- it just pays attention to frames that contain its own node ID somewhere in the routing field. So if node C happens to receive the frame transmitted by 'A' and finds its own node ID there, it blindly retransmits the frame; hopefully the next node in the route also sees it's own node ID and does the same. Even though an intermediate (and currently invalid) node was present in the routing field, it doesn't prevent reception by the next legitimate node in the path, so the transmission succeeds. What would cause a failure in this scenario is node A being completely out of RF range of node C... then the route would fail.

Now as to why the table looks like it does is another issue; one that comes up pretty often and there doesn't seem to a satisfactory answer. One would expect that it would contain the same (hopefully accurate) view of the mesh that the controller is basing its route calculations on.

Regarding Z-Wave routing in general, the veil of mystery was lifted (somewhat) with the release of the Z-Wave Network Layer spec: Z-Wave Network Layer Specification ( Z-Wave Routing ) Released

Z-Wave's routing strategy only uses two methods to generate routes: the front-up method is controller calculated, the last resort method is mesh flooding explorer frames. Neither provides for any optimization over time, nor does it track error rates to determine if it should try another route.

Rather, if the last working route has ever succeeded (regardless of hop count) it gets used/re-tried until it fails, at which point another saved route, if one exists, gets tried and used until failure. If/when all stored routes have failed repeatedly, explorer frame flooding will then be used to discover another which is then saved and re-used. The spec includes a provision to generate arbitrary 'application priority routes' not based on either of these methods, but HE doesn't provide user access to that feature.

So it's kind of simple; calculated routes come from the controller: it's the only device that knows the mesh. Key assumptions of this scheme are: the hub requests (and nodes accurately report) neighbors when topology changes, 'reported adjacent' means 'good RF link', and a 'snapshot in time' is sufficient to generate routes (and backups) which accomodate a dynamic RF environment until another snapshot is taken.

When things fall apart, it's likely that the controller is handing out routes that include non-existent nodes (ghosts); devices have been relocated, so the controller's view of the topology is stale, or the RF environment has changed (new obstructions or interference has changed the neighbor picture). There's also a practical limit to how well neighbor links can be ranked for purposes of generating efficient routes given that a static RSSI value seems to be the only metric available.

3 Likes

Below are my devices. The two Double Plugs are Zooz ZEN25s and were unplugged over 4 days ago because I read where several people reported these caused instability but removing them did not help. Why do both show they still have neighbors and the second one still shows a route to it?

Mini Plug 1 and Outdoor Plug 1 are basically on different sides of the same wall and many of other devices used one or both of these in their routing even though there are many closer devices not being listed in the routing path. They have been unplugged for well over 12 hours. Mini Plug 1 is showing in the routing for Mud Room Light and Mini Plug 10.

Outdoor Plug 1, which is just outside my front door, is still in the routing for Back Entry Light, which is my back door. Those two devices are on opposite sides of my house so routing makes no sense here as there are at least least 6 other devices in between them that are not listed in the routing.

Balcony Outlets has FIVE routes in the path. I thought 4 hops was the max so I did not even think this is possible. It consistently only has one or two neighbors despite having near line of sight to about 10 devices within 20 feet or so.



Also, the Zooz Power Switch has both neighbors and routes despite being unplugged for a least a week.

I should also mention, I have rebooted and power-cycled the hub and ran Repair a few times since I unplugged Mini Plug 1 and Outdoor Plug 1 yesterday.

You've got some devices in there that look like ghosts and have no routes... like "mechanical room double plug" and "mini plug 6." Not sure what's up with those. You've also got a lot of devices with just one neighbor... that's a bit odd. Usually I'd say you need more devices acting as repeaters but perhaps there's something else going on.

3 Likes

You can't just unplug routing devices and expect the network to behave. You should exclude them if you aren't going to use them any more.

And yes, the Zen25 seems to have issues and I would personally not use these.

4 Likes

I have no ghosts anymore. Bought a Stick to fix the couple I had. Not sure what is going on with Mini Plug 6. The double plugs I explained above. They are unplugged as they are ZooZ Zen25 plugs and some people reported problems with them

You missed my point. I removed them for testing purposes.

Removing mains powered devices that already have things routing through them may make things worse even during testing

4 Likes

What concerns me is that Hub is still reporting stale information after Repairs, reboots, power cycling of the hub and in some cases the devices. I get that if you unplug a device it will take time to reroute but I would not expect that hours or days later missing devices still show up with neighbors, routes and/or listed in the route for other devices.

Well, things are not stable now and things are not looking right. So what am I supposed to do? Do nothing? Rip it all out?

Start with removing any devices without routes. Re-pair any devices with s2 with no security (except locks and garage door openers). Turn off reporting for everything for now (you will have to do this individually per device). After that, go to settings>>backup and restore and click the download button at the bottom and save your backup to your local PC This will clean the database. Then go tomyhupip:8081 and do a soft reset. When it comes back up restore with the clean database you saved to your PC. After that shut down your hub from the settings menu. Unplug power (at the wall, not the hub!) for 20 mins and power back up. Let things settle for a day. Make sure no devices in your list are not powered. Then start adding features back in and wait, then add more etc..

4 Likes

Focus just on the devices that aren't functional; if they aren't working (or are very laggy) that indicates they don't have a working route, are trying backup routes, or trying to discover a new one via the explorer frame fallback. So if you can't put up with that, you can only do two things: Initiate a single device repair, or exclude/re-include them. Assuming nothing is actually defective, that's all you can do; either of those actions is designed to bring the hub into the loop re: where they are in the network relative to the other devices in the mesh.

Some failures during Z-Wave repair are inevitable when sleepy devices are involved; best you can do is wake them (activate their button if they have one) and do another Z-Wave repair before they go to sleep. Often multiple repairs may be required. Or you can just ignore the errors; if the calculated routes they got at inclusion still work they'll keep using them, and if they don't they also can use explorer frames to find one that does. Since sleepy devices don't repeat, their routing impact on other devices the mesh isn't substantial.

Not an elegant approach, but that's about all you can do other than let devices try to find their own route via explorers... and every time that happens your mesh gets saturated.

I wouldn't worry about the routes for functioning devices that don't look right. Whether it's a matter of the displayed data not being in sync with what is really happening or just being incorrect for some other reason is anyone's guess.

2 Likes

That is the thing. Most of the time, there is no pattern to devices with problems. If I had any issue with consistency, I feel like this would be so much easier to troubleshoot. Right now, it seems like a game of Whack-a-Mole.

Overall, most/all things will work at some point, just not constantly and reliability. This is why I have been trying to see what I can do to influence more efficient routing and increasing neighbor devices. I keep adding more devices, in many cases where I do not even need them, but trying to improve the overall mesh. I have about 30 devices within a about a 60 foot radius and none of them are more than one wall or floor away from several other devices. I just do not understand why things do not appear to be communicating better.

For instance, I have two groups for indoor and outdoor holiday nights with a simple rule to turn them on and off at specific times. In over a month not, I have never had a single day where all devices turn on then off without something missed. Some time they all turn on but only 1/2 turn off or all but one. I added metering, currently at 150ms to try to improve that and optimization is turned off. Now tonight, none of the indoor lights turned on. I tried turning them on individually and I was able to get them all to come on after a few tries. Sometimes I need to Refresh it or send a Configure to it to get them to respond.

When devices do not work initially, Hubitat will show what it thinks is the current (on/off) state even though the device is in the opposite state. So it is further frustrating when a Dashboard indicate a device is on but it is really off when I go check on it.

I can turn turn a light on and then turn around and try to turn it off and it does not respond. Sometimes it will respond a few minutes later or I have to go and turn it off manually.

This is the same behavior I have had since I started working with Hubitat and only had a couple devices. Now three dozen devices later, i was expecting things to improve but it is not. I have been at states where I had no ghosts or unplugged devices and I have never had any battery powered devices or devices added as S0. I removed all devices with power metering to avoid the extra traffic. I do not really have anything configured for automation other than to try to control the holiday
nights so the hub should not be that busy. So now I am just grasping at straws for other things to try.

I often run a full repair and have many devices report that they are not responding once completed. But when I test the device it is working fine. Run the repair again and get similar results. I can run a repair on a single device and also get back that it is not responding even though it is working.

I am not sure what you mean by "sleepy" devices unless you mean battery powered devices, which I do not have. But I do not understand why it will say not responding during a repair even though it still works.

@brandonv Sleepy devices are ones that go to sleep and don't report. That said, start with what I outlined above. And when you re pair the devices later, do it next to the hub, then move it to it's final resting place and do a repair just on that device. Doing constant z-wave repairs is not doing anything for your mesh. You have a bunch of devices that are just not taking routes properly despite you saying they work. Every column, even with battery devices should have a route in it. What your z-wave details are showing is an unhealthy mesh. We're trying to get you to start with fixing the rudimentary stuff first.

4 Likes

I would agree. It appears you have 6 dead or ghost nodes. That has to be screwing with things.

I didn't miss anything. You can't just break the mesh with removing devices and expect it to instantly recover, if it ever can.

Full repair isn't recommended on the C7, or with Zwave plus in general. Up to a point it is supposed to self-heal. IF you need to do a repair, just do the individual node repair.

Some of this is just a mystery how it all works. I assume there is some caching of that data. You for sure can't count on it being live data. In my experience it takes a day or two to make that information seem even partially accurate. At some point if it does work, ignore the oddities. And when it doesn't, use that info as guidance not a bible of what is wrong.

There is nothing you can do to make this happen. I have banks of light switches in a few places around the house. Some of these are identical Zooz switches, and in other places they are mixed with Jasco/GE. Every switch takes a different route to the hub despite being right next to each other. Some are direct to hub, some 2 hops, some 4 hops, there is no rhyme or reason. As long as they work, I just shake my head and accept it.

That is about the most you can do. Provide plenty of neighbors, and routing options, and hope the Zwave (Silabs) firmware is smart enough to make good decisions. You appear to have enough devices you should have a good mesh at this point, sometimes when there are too few devices that repeat, you can have problems with instability.

4 Likes