My take: The routing is designed to minimize latency and uses path cost as a criteria for doing so. Path cost in turn uses criteria like link error rate (retries) and signal quality. I assume (but do not know) that path costs can be additive (to make a multi-hop route less attractive than a signal hop route. So if a single hop route with poorer LQI (than a two-hop route with higher LQI's on each segment) has lower latency, it seems a reasonable routing choice.
Now LQI of 90 is indeed very poor and I wouldn't expect that to be a viable routing choice; but in your original post the LQI was 233 for the device you asked about. Is it varying that much?
Regarding the notion of maximizing coverage area--the network devices have no idea of their physical placement (or coverage area).
Okay...so why are mine not taking a second hop route and trying to directly connect to the hub? The LQI is below 100. Does it have to have absolutely no signal at all to the hub to route through a repeater? That wouldn't make any sense at all. Does anyone else have a router going through another router? Or does this not work for anyone? And if you do, how did you get it to do that?
If someone were an expert in the zigbee protocol, this seems like it would be a very fundamental, easy question to answer. I am no expert on it, that's why I'm asking. But it seems that no one actually know what it takes to get a router to route through another router rather than go directly to the hub. Is it the repeater itself that could be bad? Is there something wrong in the hub firmware? I have no idea where to even start to debug something like this and would have thought that someone here would have at least pointed me to some things to investigate or try. Or, is what I am seeing normal and that's just the way it works? I could try and debug this all day and night and not get anywhere if it's working the way it's supposed to.
One way for this to happen (as I said in one of my previous posts), if you have 17 routers in close proximity to the hub, the one that is not in the Neighbor Table will always be two hops away and will need to go through another router (one that is in the Neighbor Table). The hub can only talk directly to its child devices (by definition these are not routers) and the devices in its Neighbor Table.
I do not have 17. I have 9. So, you're saying that until the hub's neighbor table is saturated, repeaters will not route though another repeater, even if they have barely any signal at all to the hub?
The reason I bring this up is because I am trying to add devices close to this repeater but I am getting dropped messages because the LQI is so low for this repeater. So, what do i have to do? Buy more repeaters to saturate it? It doesn't seem like that would be a very good answer either.
I would assume that the device in question (with poor LQI on its link to the hub) would need to have within its own Neighbor Table (each router maintains its own Neighbor Table just like the hub does) a neighbor router with a better quality link to the hub (lower path cost)-- and that is the one that it would use to communicate with the hub. The hub's neighbor table would not need to be full for this to happen.
So you are trying to join devices in place (near the poor LQI repeater) and they will not join through that repeater? But they will join closer to the hub? I'm not quite sure I understand your scenario.
Some are joining through that repeater. But because that repeater has such a terrible signal to the hub, messages are being dropped. Others are choosing to try to go through a repeater much further away (and closer to the hub) that have a stronger signal to the hub. However because of thier distance, they have a low LQI.
It appears from my own research, that there are several different scheme's that a zigbee network can use to establish it's router table. So, does ZHA 1.2 impose a restriction on this? or is it at the discretion of the coordinator? If the later, I would like to know what routing scheme the hub is using.
When the Ember stack in the coordinator is provisioned, that determines the routing methods available (AODV, many-to-one). I assume many-to-one is being used since it is optimized for the scenario where most of the devices are talking to the same device (as opposed to each other).
I recall reading on an SiLabs forum about an issue (not HE related) where Zigbee joins failed if they were two hops or more away from the coordinator. I do not recall what the resolution was (but for some reason I think it was resolved with a software change on the coordinator's stack). I will see if I can find that discussion again.
Interesting. That is a way of addressing the inefficiency of the '17 routers in a 16 router bag' scenario. There are lots of tuning knobs. Would be interesting to know how HE has tweaked them.
I have seen a few similar issues reported from here time to time, where owners can't get a Zigbee device to join multiple hops from the hub (yet can get it to join easily when moved closer). Looks like yet another case where a deep dive into the configuration of the stack may be required.
Edit: now I know I am in trouble when I am literally talking to myself on the forum.... LOL
So, I finally got my replacement Xbee module in the mail (they were on back-order forever). I can confirm, this module that is FAR away from the hub, is in fact, not using the direct connection to communicate with the hub. That connection is marked as inactive in the XCTU network map. So, you were correct @Tony, they are listed in the Neighbor Table even when that connection is not used. I have to assume that if the slot was needed for a device that could speak directly to the hub that this device would fall off the neighbor table. But I don't have anywhere near that number of repeaters yet. My house is only 1200 sq feet, there's only so many devices I can cram in here! Thanks @Tony.
Glad you got it working. This discussion prompted me to dig out my Zigbee sniffer dongle (best $8 I ever spent); I was trying to figure out why my Zigbee door lock worked so quickly from an Echo voice command (it's an interior door lock) but showed more latency using a remote button. I found out from the sniffer trace that the button was going through three routers (the first of which was a required compatible parent for the Xiaomi button) and then on to the hub. My gut told me that this seemed unnecessary (given the physical location of the devices); I figured I'd test my hunch by unplugging the intermediate repeater. I then traced a button press event and was happy to see that it worked as I expected, eliminating one hop from the button to the hub.
It was interesting to see that there was no perceptible delay for the message from the button to begin using the new shorter route. I think I will eliminate that router permanently (it was childless so its loss had no effect on my network).
I'm always fascinated to see some of the routing choices my devices make. I know that some of it is just orientation of the antenna with the antenna of one repeater versus another. But some always shock me. And I'm not talking new devices either...I'm talking devices that have been around for a while.
I can also report that 4 days after removing my Peanut plug, I no longer see bunching in my network where a couple repeaters were not used at all. For the longest time I had two repeaters that had zero devices connected to them. 4 days after pulling my peanut plug, i have a complete even distribution of devices across all the repeaters now. And things are working much faster. Is that just coincidence or do the peanuts have something to do with it? Well, the only way to know for use would be to add them back in but I've got enough evidence for me to conclude that they were the culprit. Anyways, they will no longer be on my recommended device list.
Interesting. In my case it was annoying that sometimes the button quickly triggered the lock with almost no perceptible delay; other times it seemed to take noticeably longer. It will be interesting to see if eliminating one possible routing choice fixes this problem (or will it somehow find another roundabout path). The routing algorithm should be able to choose the 'best' path by adding the costs of each link so multi-hop routes would not be the preference; hopefully it actually works that way in practice.
Another revelation (actually this should not have been a surprise; it was just interesting to see it in action from the sniffer trace) was the finite time it takes the Zigbee lock to poll its parent router for unlock/lock requests. There's no getting around that.
I've had good service from the half dozen or so that I use (I just have buttons and the cube). They have a nice appearance and satisfying click; pretty good range as well (double edged sword, that, if you aren't familiar with their special needs). In the year or so that I've used them, I have had to re-join them a few times, though not in the last several months now that I have a couple of compatible routers for them. Before that I had to make sure they would choose the hub as their parent when joining by powering down basically every repeater in my house. And that limited where I could use them... a pain for sure. I can see how they would frustrate anyone that wasn't aware of their special needs, parent-wise.
Exactly. Now a days a lot of stuff has dropped in price. The xiaomi is pretty much in the same ballpark.
Back in the day when they got really popular it was because of the price. That was a few years ago though.
I just tell people to get proper devices now and save the headache. I have about 80 of these xiaomi devices running nicely. Took me a while though to figure it all out.