Repeater nightmare, a "single-point-of-failure" (Ikea TRADFRI)

Well, it finally happened, despite having at least two repeaters within the proximity of a number of outdoors positioned Zigbee devices I had THE ONE that the majority of them seem to migrate to go on holiday. Don't get me wrong, I like these Ikea Repeaters. The nightmare is that what I always feared happening....happened.

Thanks to Device Activity Check App ([RELEASE] Device Activity Check - Get notifications for "inactive" devices)

I realized a whole buncha devices stopped talking to the hub. Assuming there was no way in hell all those batteries went out at the same time and that there was no EMP or Solar Flare event :rofl: I suspected either: a) failure of the repeater that seems to collect more devices than I want it to, or b) the rain.

So here's some interesting (to me) outcomes that some of you might already have experienced. I got a number of different devices reporting their "cumulative" state changes over the past days all at once. In particular some Visonic contact sensors giving me the activity that transpired over the incommunicado duration. Granted some of these incur like two events a day, one Open, and one Close. The temps were not "caught up" that I can see... But the Open/Closed all came as if they'd happened in that minute right after the repeater came back online.

I haven't finished interpreting what all has transpired here. I unplugged the repeater and plugged it back in to get it to "shake off" whatever malaise it was experiencing...and it did...which is when all the devices started reporting again. I suspect the repeater may be suffering from going into it's third winter/summer extremes in a weather box exposed to whatever the outdoor temps go to.

I'm just sharing the experience for whatever that may be worth to folks. I remain befuddled over what I can do to get the Zigbee mesh to do what I want it to do in order to be more fail-safe. I would have suspected some of the devices to have re-routed...but they didn't. Some of them are really "out there" distance-wise and this one repeater (for whatever reason, including perhaps being backed by a metal wall) tends to "get them all".

Makes me wonder if I should have a backup repeater not far from the one that seems to be currently favored ...so there are TWO in that relative position to back each other up. But that never seems prudent when you're talking radios. Maybe in this case 20' away might be fine. But...would devices reroute/re-path upon the failure of one. Only the Zigbee God likely knows.

Thinking out loud. Thanks for your time.

P.S. Doesn't a DEVICES Forum make sense to have for the Community ?

Dang, I thought so too.
EDIT: Seeing note below about DEVICEs being in "Get Help", this post is probably best left in the Lounge I guess.

I could look at other repeaters....and maybe that is the case that I have (had) ONE really good/strong Ikea repeater and the others were wimps in comparison. But I gotta say, this one in the plastic box on that outbuilding has been a REAL trooper. I just didn't like that soooo many devices routed through it when they had other options that seemed much closer. Granted it IS the closest router to the hub (which is IN the metal outbuilding, yeah...imagine that working out...it does!).

1 Like

There's a "Devices" under "Get Help"

2 Likes

I've had Ikea outlets and repeaters fail and any devices that were repeating through them stop dead waiting on that repeater..... Plugging in and out always fixed the issue.

Since the last Ikea firmware update I've only seen this once in the past six months - it was at least once a month prior to that...

So you have updated firmware on the Routers themselves ???

I really like how I can remote these with a simple USB extension cable of modest length (to an exterior box).

I've had this happen. And usually 90% of the devices eventually kick back in. It could take a day for that to occur though. Usually most of my xiaomi devices would also reconnect but they would always have a few stragglers that I would have to kick.

Now one thing to keep in mind too is how many devices your repeaters can handle. I know with the ikea devices they were limited to 6 or 8 (I can't remember the exact) but after that no more would connect to it. I use to map out my zigbee mesh and could see this limit. My light strip could handle repeating 20+ devices. If one of the repeaters dies hopefully the backup repeater will have enough resources to handle the devices jumping from the dead repeaters to keep things going.

1 Like

Ooooooh, did not realize it could be as low as that.

Presuming this one repeater is in some favored sweet spot, having another might allow load sharing AND backup?

To update firmware on Ikea devices you need to have an Ikea hub.
I keep one around just for this purpose.

You need to pair each Ikea device to the hub and let it update - then re-pair to Hubitat.

Note that some of the Ikea repeaters have a much greater limit.... repeaters I got bundled with Ikea blinds seem to repeat for at least 16 devices, whereas other that I brought on their own have I think an 8 device limit.

1 Like

With redundancy-- meaning, the mesh has more than one in-range repeater with capacity to handle enough end devices should one fail-- a Zigbee mesh can recover transparently when a repeater fails... but it depends on the nature of the failure.

Zigbee end devices can't determine alternate paths to the hub; they have no routing smarts whatsoever. The only path a Zigbee end device ever uses (absent an orphan/rejoin scenario) is a single hop to the parent chosen when it joined/rejoined the mesh; that choice being based on what the in-range repeaters were advertising about the quality of their radio reception with respect to the end device and their ability to accept new child devices. In contrast, a sleepy Z-Wave device has no special affinity to a single Z-Wave repeater and can try in succession multiple pre-calculated routes sent to it during inclusion or repair, or resort to explorer frames in the event of failure.

Unless a Zigbee end device considers itself orphaned (when its check-in interval has expired with no response from the parent) or the parent has told the end device to leave the network, it won't look for another parent repeater. So if the Zigbee stack running in a repeater is alive at the MAC layer (dutifully acknowledging periodic check-ins with its child devices and buffering their incoming/outgoing messages) but effectively dead at the network layer (failing to actually route any messages), all its child devices will remain isolated from the network; redundant repeaters can't help with this scenario.

The good news is that (usually) when a redundant Zigbee router fails, the remaining repeaters will re-establish the mesh backbone very quickly, within a couple of 15 sec. link status intervals. End devices make take more time (depending on how quickly they detect their orphan status; it varies with how the check-in intervals are set) but recovery should be automatic.

9 Likes

This is really helpful to read.

I'm sure I've run across bits and pieces of this posted elsewhere, should be pulled together someplace.

But it only really becomes meaningful when you are into it with a particular scenario and you start asking the how/why/whens of it all.

So thanks.

3 Likes

Just to add to this, and other posts you've made on Zigbee & repeaters, I wanted to note a confirming experience that LQI ain't everything.

Just did some repositioning of an Ikea repeater that I thought would improve the path resilience for some distant outdoor devices.

While the repeater in the new location pulled an LQI consistently over 245 the end devices that are all within 50' of the repeater decided to make the trip ALL the way back about 80' through more walls & obstructions to reach a repeating outlet in another structure.

Here's the report for the repeater:
LQI:255, age:3, inCost:1, outCost:7

and for the chosen outlet (which happens to be in the same room as the hub):
LQI:255, age:3, inCost:1, outCost:1

Exactly the same, except for the outCost which I assume has something to do with the metal roof on the building the Ikea repeater was reposition-ed to. While it did not seem to impact the inbound signal, it did the outbound. Pure presumption.

Moral of the story...ALL those numbers matter to the path selection. I always looked, pretty exclusively, at the LQI to "feel good" about placement. Wrong.

True; the problem with looking at just the LQI shown in getChildandRouteInfo is that it only gives you a measure of the hub's reception. But there are actually two LQI's that matter (generated at each end of a link). How each LQI is derived is implementation dependent (some devices measure just signal strength; some take into account link errors and retries) but the objective is the same-- to provide 'cost' figures for both inbound and outbound messages so that the best path can be selected when there is more than one available.

As you noted, the LQI shown in the neighbor table is a good indicator for only half of the link-- the 'hearing' part. It gets mapped into the 'inCost' number. Meanwhile, the node at the other end of the link generates its own LQI, maps it into a number in the 1-7 range, and transmits that to the hub periodically where it becomes the 'outCost' you see in the neighbor table. That's the 'being heard' part.

So ultimately you really need to look at inCost and outCost numbers (both derived from LQI's at opposite ends of the link) to get a more complete picture of how well the hub 'hears' and 'is heard by' one of its neighbors.

3 Likes

So continued reporting on this story for those that follow.... A repeater's gotta repeat...both ways. Obvious to all but...let's really make it clear with a little Zigbee walk in the woods:

[Repeater A - On Metal Building, B189], LQI:255, age:3, inCost:1, outCost:1
[Repeater B - In far away Wood Building , 9814], LQI:221, age:7, inCost:5, outCost:0

status:Active, age:64, routeRecordState:0, concentratorType:None, [Valve - Corn on far side of Wood Building, 7ABC] via [Repeater A - On Metal Building, B189]

So above we have Repeater A, Repeater B, and a water Valve on the Corn.

The HE Hub is IN the Metal Building upon which Repeater A is mounted outside. Repeater B is about 100' off IN a wood structure, and on the far side of that is the Valve.

Here's a relative drawing: A ----------------- B -- V

So, to my surprise the Valve communications skips Repeater B, goes THROUGH the wood building all the way back to Repeater A. The inCost/outCost numbers explains this but you couldn't get me to believe that putting Repeater B in the wood building "helping signal get through to the far side" wasn't a good idea until I tried it.

So the whole deal here is that Repeater A is close enough to the Hub that the signal blows through that metal wall and communications are maintained with all far flung devices.

Repeater B is just a "B"lip in comparison, it is seen by the Hub fairly clearly with an LQI of 221 (pretty consistently)....but it just doesn't get a quality signal back through that metal wall to the Hub that far away.

Now... I'd be lying if I didn't say I kinda expected Repeater B to talk through Repeater A when I first set this all up.

End of walk.

1 Like

Another reason why the oft repeated advice "You can't have too many repeaters" doesn't apply to Zigbee.. Non optimal repeaters won't get used; the protocol is smart enough to avoid the low quality paths they might provide. But they still add useless overhead, taking up airtime and compute cycles in the SoC's of the routers with link status reporting.

Edit: There's a chance that the valve is a child of B in this scenario, and B is in fact a vital part of the path.. only the first hop from the hub is the one shown in the hub's Route Entry table, so it would be showing 'via A' even though the actual route is hub > A > B > V. The outCost=0 for B is for the direct hub > B connection... this indicates to the hub it should use the path A > B > V since it can't establish a good bidirectional link directly with B. Since the link status between A and B isn't shown, there may be a good connection between A and B which is handling the communication from V.

Or... V may in fact be a child of A and B isn't needed. You can't tell from the available info.

Way to find out: temporarily unplug B and see what happens.

2 Likes

Granted all this might be in the interest of power preservation/conservation... but in what networking world would it seem "good enough" to not be able to trace every relay point to understand the usefulness of the mesh nodes you are deploying. I guess one where you expect the environments to have numerous nodes such that "it doesn't matter that you know the path".

I guess the Xbee tool(s) collects more insight in this respect ?

The hub's getChildandRouteInfo provides one view of the mesh (the hub centric part), so it's not a substitute for a mesh mapping tool like Xbee/XCTU, which pretty much aggregates that kind of info for every router in the entire network.

Granted it's not enough to give you a complete picture of the mesh but it's still useful, and apparently easy enough to provide without adding much of a burden to the hub software.

1 Like

On the topic of the Ikea Tradfri repeater and positioning it for the best range to devices further afield...

I thought I would offer this PSA -

The "business end" of these (the module w/ the Zigbee radio and all the smarts) ...is as "remote-able" as-you-are-handy in getting the full 5VDC (1Amp, 5W) to it AND placing it in a non-metallic waterproof enclosure. That includes considering the adequate gauge of wire for the distance. Pushing 5VDC is a totally different requirement than having to push the USB standard (with data) to a good location.

In one case for me all I need to do is get from a UPS out through a outbuilding wall to a protected box. When I first did this I messed with a UBS extension cable having to cut various holes to accommodate it's fat connector head. Then it dawned on me that all I have to do is get that 5VDC power there.

...and then I remember I had bought some of these:

And of course you can power the "business end" of the Ikea repeater from either side so the gender of this little converter cable isn't an issue.

Disclaimer: I have not used the above cable yet, I've only bench tested it. IF you try anything like this, it's YOUR experiment so consider the potential loss. Furthermore, let me state the obvious ...that if you go and put one of these repeaters 20' up a pole...you are adding incremental risk over a 5' run to a position on the exterior of a building. :cloud_with_lightning: :zap: :zap: