Problem with my Zigbee network with delay and disconnection

Hello friend, I hope you are well.
For two days my Zigbee network has been showing a rather strange delay. I have had my hub for 4 months that has not failed and started with this problem. For example, with one of my buttons I open the log and press it but it does not transmit anything, after 5 seconds I do it again and it works.

Any idea or light where you can guide me?

my routing table

May want to check below thread. Also, sharing the model of your hub would be helpful.

Though the getChildandRoute info won't pinpoint the cause (could be interference, lack of repeater coverage, or possibly a failing device), it's helpful in highlighting which devices are having problems.

It would be useful to look at the getChildandRoute info page from time to time to see how/if the neighbor table changes; the inCost and outCost numbers (which indicate reception and transmission quality on neighbor links) get generated several times a minute.

You will want to pay attention to which neighbors show high (>3) cost numbers; that usually indicates poor reception or transmission. If you see '0' for outCost that means the hub didn't get a valid link status report within 6 status intervals (another indication of a poor RF link). Periodic link status containing the outCost figure tell the hub how its neighbor routers are receiving its transmissions (as opposed to LQI which just indicates how the hub hears its neighbors).

Seeing a few neighbor routers with bad link quality may not be an issue as long as other neighbors have adequate links; the hub should automatically avoid using poor links for routing if better paths are available. But if you're always seeing outCost=0 and it is associated with a neighbor router that appears after the word 'via' in a Route Table entry, that's not good-- the Neighbor Table entries show which neighbor router the hub needs to use (as the first hop router in the path) to get to a given device. If it's using one with '0' outCost, that means it's not finding a better router available to use.

The screenshot you posted shows a few problem devices with this exact scenario... I've highlighted those routers showing '0' outCost which the hub is also using as 'first hop' routers; those routers aren't sending proper link status to the hub. So any device that the hub wants to reach 'via' those routers (I've highlighted those devices in magenta in the Route Table entries) may have communication issues as a result. Once you have eliminated RF interference or signal blockages that may be the cause, it might be necessary to add an additional repeater to solve these issues.

It's also possible you have a repeater that is failing and that could be the root cause... The neighbor table shows a device as (null, 89E6); this could be a device that left and rejoined the network because it is having issues-- its device ID will change (at least for earlier hubs; not for C-8) each time this happens; normally after a while the 'null' gets replaced by its device name as the database catches up with the change. If you're always seeing null and a different device ID, it could be you have a bad repeater (maybe its SoC is constantly rebooting, or it has other issues preventing it from staying connected). It could be a challenge to figure out what it is; maybe start by eliminating the repeaters that are shown in the table and see if you can find one that never shows up....

One more oddity I see is that you have one repeater (the 7106 device ID) that appears to be actively routing for several devices (showing as a 'via' several times in Route Table entries). Yet, in one of the Route table entries, the hub apparently is trying to reach that 7106 router via the A849 device (a problematic one that shows outCost=0). That's really unexpected; maybe happening because the mesh is very unstable. Those Route Table entries are transient and change whenever the hub initiates new route discoveries (which it does pretty much constantly). When things stabilize, you shouldn't be seeing that.

Dear Tonio, I am super grateful for this explanation, and with this I understand more the data that the routing table gives me. Additionally, I was able to see in the logs this message from a motion sensor here that I have it on the second floor.

By the way my Hub is a C-7.

1 Like

The warning messages look like output from one of Markus's Aqara drivers; another indication that a device has lost contact with its parent repeater.

It might be related to the 'null' device restarting and rejoining; it's possible that finding that device and resolving that issue might fix this one as well.

1 Like

Tonio, one last question and to clarify.
The costin= indicates the quality that this device receives from the hub and the costout= indicates that this device has poor quality to repeat the packet?

Can you clarify those for me?

thank you

Now it's getting worse, I removed a device that I thought was causing me a problem and I see several costouts at 0.

The strange thing is that they are sonoff zbmini devices with neutral, that is, they are routers and one is attached to the other.

Both inCost and outCost are measures of signal quality that are derived from LQI's measured at opposite ends of the link. The hub and each neighbor router are always calculating LQI and transmitting it has a 'cost' figure to the device at the opposite end of the link during perodic status exchanges.

So the inCost you see in the table is derived from the LQI that the hub has computed based on its reception quality of the repeater at the remote end of the link. Basically a computed LQI number from 0 to 255 gets mapped into a 3 bit number from 1 to 7 (where lower numbers are better). The hub sends this cost figure during the link status exchange to the remote neighbor (where it becomes the neighbor's outCost figure for that path).

Meanwhile, that remote neighbor is measuring an LQI figure based on its reception of the hub; it maps this into its inCost number (again from 1 to 7) and transmits it to the hub during a link status exchange; this number becomes the outCost figure on that link from the point of view of the hub-- it gives the hub an indication of reception quality (really, LQI) from the point of view of the receiver of the router at the other end of the link.

That way both hub and its neighbors can avoid links which aren't good from the point of view of reception and transmission quality.

The routing algorithm then tries to select paths (if there are more than one to a given destination) which have the lowest costs (they are additive along each hop of a route if more than one repeater is involved) to avoid paths likely to require retransmissions.