Zigbee Network Issues

bobbles · June 6, 2019, 10:47am

Just thought I would update where I am with the zigbee issues I've been having.
Removed my ST outlet and then shut down the hub for 30 minutes.
All my zigbee devices came back on line OK.
I have not had anything drop off the network for 2 days now.
From my point of view the ST outlet is NOT a good repeater and will now be consigned to my garage Christmas lights.

EDIT:BTW the zigbee route info script now works again. Spooky.

aguileramekin · September 3, 2019, 4:13pm

Today when I wake up my whole zigbee network was down and I realize of that because of this same message on my hub "zigbee network offline".

My Hub is a very clear place so I guess this do not happend due heat, so following the logic I faced a flood of my network? I do not have too many devices and this never happened before.

Which are the cause of this?
How can be avoided?

aguileramekin · September 3, 2019, 5:05pm

Some more questions:
I made a test today and I removed the battery of my "Blue Bedroom Button" but in the Table Route still the same info even when I refresh the browser several times......

but after restore the battery after some minutes I realize this entrance was removed from my table route so.. the info showed in that tabel has some delay right?.

Do any of you know how many minutes of delay the table route have?
If where is saying [DeviceX] via [Device Y] means the [DeviceX] is connecting throught the [DeviceY] then I realize some devices are not connected to the nearest repeaters. Is there a way to adjust this or they connect to the better one?
The "age" is the signal strenght of each devide?

Tony · September 3, 2019, 7:56pm

I'm not sure how often the route table info is updated for display purposes in the log, but for the Ember Zigbee stack, the 'age' count represents the number of 16 second intervals that have elapsed since the last link status message was received (a Zigbee router usually sends a link status message every 16 seconds; when the neighbor table entry's age count becomes greater than 6, this neighbor entry is considered 'stale' and is eligible to be 'evicted' from the table and its 'cost' gets set to zero, indicating 'unknown').

The end devices choose their 'best' parent router based on their assessment of the link (which may not be the closest, physically). You don't normally have a way to control this process.

aguileramekin · September 4, 2019, 3:46am

Wow amazing explanation..... I got it Tony thanks!!

So there is no strength signal report here right?
... is there any other screen where I can see where is better to set physically a device?

Tony · September 4, 2019, 3:58am

The LQI figure can indicate signal strength/integrity of the link as a whole (in some implementations it takes into account the link error rate); numbers approaching 255 are better (basically if it is >200 it should be acceptable).

If you want to see the 'last hop' received signal strength metric (RSSI) from a device (keep in mind if an end device is not a child of the hub, you will be seeing the RSSI of the last router in the path's transmitter, not the end device's-- that's what the 'last hop' means), go to Settings > Zigbee Details > Zigbee Logging. But note that RSSI is a 'raw' measure of the RF energy and can include other networks in the same frequency band, as well as noise.

The more negative the RSSI number, the lower the measured signal strength. Most of my devices very close to the hub show RSSI's in the -40's / -50's or so; I don't have any apparent issues even when they are in the -80s although I'm sure some retries may be happening. When the RSSI gets to the -90's though, the signal is getting down to the noise level.

aguileramekin · September 5, 2019, 3:17pm

Hi guys
Today I faced another issue here.... My wife waked up first and she told me everything was ok with my HE (lights turning on automatically, etc...), but when I waked up nothing was wortking and I couldn't login in HE as the web interface never shows up. I checked and the hub was connected to my intranet (ping was ok).

So I checked Past Logs and I found this:

dev133 is a Generic Zigbee Motion Sensor on my kitchen (a samsung motion sensor)

I never saw that error before and after that nothing was working.... I have to remove the power manually and plug it again. Now all is ok.

What was that?
Why happened?
Don't want this happen when I be outside my home.... so how to avoid it?

SoundersDude · January 17, 2020, 4:55pm

Did you ever get an answer to this? I'm also curious as my peanut plugs report this.

srwhite · January 17, 2020, 7:40pm

A low ram concentrator is just what it sounds like. The coordinator device does not have enough memory to feel a full routing table so it only stores a partial table. When the coordinator needs to send a message to a device but does not know the route, it first has to send out a discover frame and wait for a routing response before it can actually send the message.

A high-ram concentrator is the opposite. It maintains a full routing table to each end device. This also means that a high-ram concentrator has to frequently request route information from devices that act as routers so it keep its routing tables updated. Since the coordinator knows the routes to each device, whenever it needs to send a message it can do so without having to wait for route discovery. This method usually results in much quicker response times to devices.

Ken_Fraleigh · January 17, 2020, 10:14pm

So, from this I take it that "concentratorType:None" routers would always have to send a discovery frame before sending a message? If a "low ram" device is constantly changing routes on the routing table as in this thread, will that negatively impact the coordinator, since it has to keep requesting routes? I only ask because others, as well as myself, have noticed this happening with the Peanut plugs, and since rejoining them to my "lights" hub I have seen a consistent increase in latency using Hub Watchdog. Maybe @mike.maxwell could take a stab at what is going on there.

scottgu3 · January 17, 2020, 10:19pm

Interesting. Glad @srwhite threw some sunlight on this particular bit of shady area in my Zigbee knowledge!

@Ken_Fraleigh that's an interesting question!

I wonder. I think I'm presently only using one Peanut, but it might be worth a remove to see what happens...

Scott

srwhite · January 17, 2020, 10:30pm

For one thing, in EmberZNet there is no such thing. A concentrator is either low ram or high ram. None isn't an option.

But low ram concentrators have to frequently send route discovery requests prior to sending a message. That's horrible for large networks because these are effectively broadcast messages which can start to degrade a network.

The concept of high/low ram only applies to concentrators. A concentrator (or coordinator) is the hub in the Hubitat world. Whether a coordinator is high ram (full route table) or low ram (partial route table), route discovery can still happen since Zigbee is a self-updating mesh protocol.

Hubs like ST and HE use MTORR (many-to-one-routing) where the only thing a device talks to is the hub, and vice versa... Devices do not exchange messages with other devices. This requires a high-ram concentrator in order to avoid the overhead of route discovery.

What's probably happening, and one of the reasons why many bulbs suck as routers, is because they do not have a lot of RAM and processing.. They are prone to buffer overruns, corrupted messages, and who knows, they could be blasing out a ton of route discover messages (which routing devices can do) causing the network to remain in a constant state of flux.

I would remove the bulbs and see if the peanuts stabilize, then try the same in reverse by adding the bulbs back and taking the peanuts out. It won't isolate a bad device, but will potentially help narrow it down to a class of devices.

Ken_Fraleigh · January 18, 2020, 12:09am

I appreciate you taking the time to explain that so well, but I have one question; When it says

[quote="Ken_Fraleigh, post:103, topic:15240"]
concentratorType:None
[/quote] , what does that mean if it isn’t a thing. I see that on bulbs, in-wall dimmers, and some plugs.

I would love to see, but this isn’t going to happen. My wife and kids would not be okay with hanging out in the dark. 60 bulbs are on the zigbee coordinator which only leaves the bedrooms, kitchen, and outdoors which have Hue, Hue, and zigbee switches respectively. I will just pitch the Peanuts into the drawer until next Christmas.

srwhite · January 18, 2020, 12:21am

The difference is one is an API parameter, the other is a specification on a box... When you see concentrator type as being none on a spec sheet, it means that device is incapable of acting as a network coordinator (i.e. hub). It has nothing to do with the size and type of a routing table.

Hard to argue with that... Of course you can always give her your credit card and have her take the kids shopping!

Ken_Fraleigh · January 18, 2020, 12:45am

But seriously, I appreciate you taking the time to explain. I have wondered what the heck that meant since I discovered the getchildandrouteinfo page.
EDIT: Peanuts have gone to the drawer btw.

Ryan780 · January 18, 2020, 4:49am

But I have no bulbs on my mesh except for Sengleds, which are end devices. And I had one peanut plug that was jumping around from router to router like crazy. So, is that a problem or not?

Then why are the peanut plugs listed as low ram concentrators and why do they keep jumping from router to router?

Also, the reason this whole topic got brought up in the first place is that I have a repeater, an Iris plug, which i paired in place. It is showing up in the neighbor table of the hub with an LQI less than 100. Now, there are 8 other repeaters in my network, 3 of which sit between the hub and this plug. One has direct line-of-site to this plug and that plug has an LQI of 253. So, my question is, why is this outlet still trying to connect directly to the hub rather than go through another repeater. An LQI of less than 100 is a very bad thing, right? So, why is it still trying to do that? Shouldn't it try to do a 2-hop route to the hub instead of tring to go directly to it? I mean, isn't that what a "mesh" network is supposed to be able to do?

srwhite · January 18, 2020, 6:57am

I should clarify since I was speaking earlier in terms of the hub and its relation to the output of getChildAndRoute report.. In a general sense, a concentrator is as routing-capable device, which the peanut plugs are (so are bulbs).. That means those devices keep a partial routing table, which they have to in order to advertise themselves as routers to other devices..

FWIW.. I have never seen any device other than the coordinator be a high ram (i.e. full routing table) concentrator.

Why your devices are link hopping is anyones guess.. The one thing I could suggest looking at is 2.4GHz inteference with 802.11 devices. If you have an Xbee available you can use it as a spectrum analyzer.. Plug it in to a laptop and move it around the house to see if there might be any RF hot spots.

JasonJoel · January 18, 2020, 1:30pm

XBee (Pro and non-Pro) are High Ram concentrators:

Those are the only devices I've seen that are High Ram concentrators, though.

srwhite · January 18, 2020, 2:51pm

That would make sense since you can use an XBee is a coordinator..

Tony · January 18, 2020, 3:01pm

Just because your low-LQI Iris plug is appearing in the Neighbor Table, it does not mean that the hub is communicating with it directly (though the mere fact that it is in the Neighbor table means that it is, or has been, at least marginally capable of doing so). The Neighbor Table is not the same thing as the 'routing table' and does not show the route records (complete path of intervening routers) that the hub is actually using to communicate to this device (also note that the Route Table Entry listing is also not the complete routing table, but just the next hop recently used to reach a device). When many-to-one source routing is being used, the route records stored by the hub will contain a reverse source route (sent along with the actual outbound Zigbee message to each successive next hop)-- this does specify the actual path being used. This path will have been determined as the 'least cost'/best path via LQI and link cost exchanges that have occurred during prior route discovery in response a route request network command sent by the hub. The hub will store these route records for each device in the network in the routing table. Again, the route records are not visible through the ChildandRouteInfo page. However, if the device is indeed going through an intervening router, its next hop will at some point show up in a Route Table Entry, shown reachable as a 'via' through the intermediate router. This would be the indication that your problematic router is reachable by a router with a lower-cost/better path to the hub.

For example in my network the Neighbor Table is full, with 16 routers listed. Two of them, Iris Plugs H3 and H8, both appear in the Neighbor Table, with LQI 255 and 254 respectively. However I can see that a Route Table entry for H8 appears as follows:

[Living Room Iris Smart Plug H8, 152F] via [Living Room Xbox Iris Smart Plug H3, DD56]

So I know that even though both appear in the Neighbor Table (and each with good LQI), messages from the hub to plug H8 are being routed using H3 as the next hop as part of a two-hop route. Why this less efficient two hop path is being used when the hub is evidently capable of communicating with H8 directly is another subject. I'll chalk it up to the dynamic nature of the network, imperfect optimizations of the link path cost, or whatever. It likely makes no perceptible difference in latency so I am not stressing about it (H3 is an Iris plug literally two feet from the hub; H8 is diagonally across the room about 25 feet further away).