Odd Zigbee Behavior on the C-8

I'm observing strange behavior with Zigbee on the C-8; I may not have accurately nailed down all the details, but I'll outline them here in case anyone can corroborate.

TL;DR: What I appear to be seeing:

  • shortID's shown in the hub's child table don't match those shown in the Zigbee details page. They never change, even when a shortID of the child does change on rejoin;
  • child table devices show NULL instead of name, and "type:UNKNOWN"
  • The C-8 should be assigning randomized shortID's on device reset rejoins... that isn't happening for end devices, they stay the same, and devices aren't working after the reset/rejoin
  • The C-8 does issue a randomized shortID on a router rejoin as expected.
  • Rejoins of end devices that happen via a router do receive a new randomized shortID, and rejoins of end devices through a router appear to work

Background: Yesterday on receiving my C-8 I migrated (successfully) from C-7 a small setup of 6 Zigbee devices: 1 GE plug-in dimmer and 5 battery devices. No automations or Z-Wave.
Migration was apparently successful; all devices worked as expected (via web interface, or by activating sensors/buttons). Also noted that child devices properly rejoined another parent (either C-8, or GE plug) when their orginal parent was out of range. C-7 has remained powered off since.

However I noticed something strange; the short ID's of all end devices were not correctly listed in the child table of the C-8 (from the getChildandRoute info page). For example, Zigbee details page showed end devices with shortID's 30D8,7E76,AD16,C697,1FDF. These devices were direct connected to the C-8; the getChildandRoute info page showed 065D,54A6,2D46,01FD. Zigbee sniffer was running and could see the proper shortID's (those listed on the details page) were actually on the network. Devices still functioned normally, however.

If I had stopped here (and not compared the hub's child table to the sniffer trace) everything would appear to be operating perfectly, aside from the child table anomaly.

Broken behavior began when I tried to reset and rejoin a previously joined device; I tried rejoining an Iris V2 motion sensor via factory reset (held button push while inserting battery) and initiated 'add device' on the C-8.

Expected behavior was a device rejoin (indicated as 'previously found') along with a new (randomized) 4 digit shortID assignment. Instead there was no 'previously found' message (though the device did show the normal blinking blue followed by green flashes), and device details page showed the same exact shortID as before, and the device wasn't functional-- even though sniffer showed the (unchanged) shortID actively communicating with the hub (it was direct connected), its status on the details page didn't update. The bogus child table shortID's also remained unchanged.

Same thing happened with reset/rejoin of another battery device (Iris V2 contact)-- it received the exact same shortID after reset but didn't operate correctly.

I then tried rejoining the GE plug; it did receive a new randomized shortID as expected, was indicated as 'previously found', and operated correctly.

Those previous end-device rejoins were attempted close to the hub (hence they appeared in the C-8's child table). To see what would happen when the join took place through a router, I rejoined a battery device close to the GE plug; expected behavior was that the router (GE plug) would generate a randomized shortID for the end device-- and this did in fact happen (along with a 'previously found' message); the rejoined device worked properly.

I unplugged the GE dimmer, took the same battery device to the C-8 and repeated a reset/rejoin. It again rejoined with the same shortID that the GE router had issued it (it should have received a randomized shortID from the C-8); no 'previously found' message, and it didn't function. Strangely the device as shown in C-8's getChildandRouteinfo child table continued to display the bogus shortID that it displayed from the initial migration. Evidently it continued to be matched through its IEEE addresss to the original bogus shortID.

I haven't tried rebooting the Zigbee radio, or shutting down the C-8 for an extended time (though it had been shutdown/relocated prior to these experiments) or anything else yet to see if normalcy returns...

12 Likes

thanks for the detailed analysis.
That should give us something to dig into.

8 Likes

I do see the same behavior in the hub's child table after reset/rejoin of a few devices (after migration from c7 to c8). But so far I haven't experienced any malfunction.

So after updating to the latest C-8 firmware (2.3.5.104) I experimented a bit more trying to come up with repeatable behaviors.

I haven't (yet) reset the Zigbee radio; again what I'm describing so far is in the aftermath of migrating (apparently successfuly) a small set of Zigbee devices from C-7 to C-8.

What I can conclude with reasonable certainty is:

  • Zigbee router (GE dimmer plug, in this case) rejoins/reset+rejoins with the expected behavior (getting new shortID on reset) and shows correct shortID in neighbor table of the getChildandRouteInfo page.

  • A migrated end device (in this case, a Third Reality motion sensor) which has never been reset joins/rejoins normally, even after battery pull (w/o reset). ShortID from migration has never changed; does not match short ID shown in Child table. The device will rejoin transparently (after a battery pull/reinsert) either via the C-8 or the GE plug router.

  • Migrated end devices (in this case, Iris V2 motion & LoraTap Zigbee 3.0 4-button devices) consistently fail to work normally after factory reset/rejoin. Unless rejoin is through a router, they'll (incorrectly) get the same shortID they had previously and subsequently will not pair properly (no 'previously found' message or any indication pairing has completed, yet sniffer shows old shortID's are active on the network. Devices subsequently won't function. Even when rejoined via router (which properly issues them a new shortID and produces 'previously found' message on joining) the devices won't function in their 'old slot'.

  • A migrated end device can be coaxed to work normally on the network by 'Removing' it via its device details page followed by factory reset and 'Add Device'. If joined directly via the C-8, it (incorrectly) again gets its old shortID, however pairing completes normally. It will rejoin and work after a battery pull/insert if it rejoins via the C-8 . If (after the battery pull/insert) it rejoins via the router, it appears to have rejoined correctly (V2 motion shows normal flashing green light) but it doesn't function, and sniffer trace shows hub continously repeating route discovery for it. Once brought back into range of the hub (out of range of the router), it will transparently rejoin and begin working normally again.

  • The child table always shows the same bogus shortID's of end devices (the same ones from the initial migration, along with UNKNOWN instead of the expected 'SLEEPY END DEVICE') that do not correspond to the shortID's actually in use on the network. They come and go depending on whether the end device is on the network, but never change to reflect the current actual shortID.

I haven't (yet) tried adding a 'virgin' device to this setup to see how it behaves. Will probably also try resetting the Zigbee radio to see if things begin to function normally.

While its probably risky to extrapolate the behavior of this tiny test setup to a larger network, it does demonstrate the broken rejoin behavior (there may be other effects that only come into play with multiple routers). But it seems probable that you could transform a perfectly functioning migrated Zigbee network to a dysfunctional mess with a few (normally benign) device factory resets and rejoins.

7 Likes

Thank you for documenting these findings so meticulously, Tony!

This is all very much in line with ZigBee issues I ran into starting early last week (on my C7) when I attempted to rejoin a couple directly-paired Hue bulbs that I'd pulled off to do a f/w update via the Hue bridge.

A few other posts in the community here around that same time (all pre-C8) hinted at similar ZigBee gremlin experiences.

I was finally able to join those 2 bulbs on my C8 Saturday, but it was pretty touch-n-go and took several attempts to wrestle through it - it was not confidence-inspiring.

I'm just glad youI've been able to provide some genuinely helpful data points here for Mike to chew on -- it's a heckuva lot better than my "uh, something seems wacky here..." :slight_smile:

3 Likes

Thanks, though it never occurred to me that the c-7 might exhibit some of this behavior. I can't say since I've never really done much with it aside from creating the small setup to see how migration worked. Fortunately my c-3 continues to soldier on, seemingly immune to Z-Wave and Zigbee excitement.

2 Likes

It manifested for me on my C7 last week, so I suspect it's firmware thing... Either an issue with the hub f/w, or some custom driver I started using lately, or an unlucky combination of all the above.

But I have no idea when it will actually started... My overall ZigBee performance has always been fine overall, and I only discovered this when I tried to rejoin those 2 bulbs. Otherwise, it's been a long while since I joined something (new or old).

But I suspect it started relatively recently - maybe around the time Mike was doing some ZigBee work in the Betas ~2/3 weeks ago (or whenever those last Betas were recently).

But I'm admittedly grasping at straws.

2 Likes

These symptoms are consistent with my experiences. I'm glad you have the skills to do the technical analysis. I eventually reset the radio twice. Everything is now working.

3 Likes

I was wondering what the odd Child devices were. I thought something got borked in the Zigbee migration, but haven’t been able to find any problem devices. I’m very appreciative of your detailed analysis and sharing of your observations. I will not be trying to reset any devices until this is sorted, although I might not be as affected having a full neighbor table of zigbee routers. I have added a few new devices without issue. I migrated around 80 Zigbee devices from my C-5 and I have been amazed at how smooth the transition was. It makes me want to upgrade my other C-5 (as soon as my wife lets me).

2 Likes

Seeing any pattern in how things were behaving would have been much more difficult if I wasn't working with a toy-sized mesh.

For one thing if I had been able to actually migrate my 'production' mesh, it most likely would have gone fine and I wouldn't have been tempted to play with it (though I can see the potential for problems down the road... changes to Zigbee end-device shortID's--by rejoins through other routers-- aren't an especially rare occurrence, and the coordinator needs to be able deal with it).

Also it's much easier to isolate the variables when you're dealing with a single router (it's easy to force an end device join via the coordinator or a router, when there is only one of each).

2 Likes

Some good news is that I played a bit with my C-7 and so far don't seen anything peculiar.

In this case I powered up the C-7 (with C-8 powered off) just to confirm that it still had the exact same extended PAN ID as pre-migration (which would conflict with C-8 had that not been cut off from power). I also satisfied my curiosity that a never-reset device currently paired to the C-8 would not continue to work on the C-7, even though it shared the same 64-bit PAN ID the C-7 was still using, still using the same device ID that the C-7 had for it on its device page (evidently its network key had changed post migration, as it would be expected after some time when running joined to the C-8).

After updating it to the current f/w I then reset C-7's Zigbee radio (wiping the 64-bit PAN ID it donated to the C-8) and soft reset it. The few devices I joined (GE dimmer plug, V2 motion, and LoraTap 4-button) paired without issues. Their shortID's appear correctly in the C-7's child table; I went through a factory reset of the button device a couple of times and it correctly was recognized as a rejoined device, and the C-7 gave it a new shortID. Child table reflected the changes properly.

I can see the potential for hijinks to ensue if the C-7 gets powered up on the same channel post-migration with a just migrated C-8 not fully unplugged from power....
Zigbee's designed with a 16-bit PAN ID conflict resolution scheme (but this only works when the coordinators have unique 64-bit extended PAN ID's). I don't know what would happen if two coordinators with identical short and extended PAN ID's were both 'live' (but I bet it would be ugly).

1 Like

My Zigbee devices have largely been fine. I did have to reboot my SonOff USB dongle repeaters to get them working normally/picking up devices, and today discovered that an Iris v2 motion sensor had gone quiet/no events since two days ago and wasn't reporting any motion/batter/temps any more. Power cycle resolved that.

A couple hours after The Great Migration, my Zigbee radio rebooted itself. I don’t know why, but it’s been fine since. I haven’t had to touch anything, including the Sonoff dongles.

1 Like

It's known as self-actualization. You have a hippie radio. :wink:

3 Likes

Yeah, on the plus side, I'm set for the foreseeable future... Now that I finally got those 2 bulbs back online (and I'm not having any other overall ZB issues), I don't have any plans to add or rejoin any devices.

I think you've given Mike some great info here -- if there is something amiss, I'm confident that will help him get to the bottom of it.

Erring on the side of caution, I did remove a couple community zigbee drivers in favor of stock drivers -- it's just hard to know what may have tipped all of this over the edge!

2 Likes

True that, one of the best trouble-shooting/investigative summaries I've ever seen here. Five stars...

4 Likes

Finding more Zigbee devices, all motion sensors so far, that stopped reporting, two of four motion sensors in my kitchen have been silent since the 6th. Need to go look at my leak sensors...

I just found one Iris V2 motion sensor that hadn’t reported since 3/6. It wouldn’t pair to the hub until I removed it from everything and deleted it from the hub. Then it paired without any problem.
It initially showed up as found an existing device, but would not report motion. I tried several more times (with a new battery) and it would not find it at all. That’s when I removed it from my rules and deleted it. Once it was deleted, it paired normally.

1 Like

That's consistent with what I have been seeing; if devices are 'removed' they'll appear to pair normally (though you need to modify the automations that used them prior to the removal).

As I noted above though, these 'new' pairings will show issues if they happen to subsequently rejoin (not reset) via a router. I tested this by removing a battery from one of my 'revived' V2's and reinserting the battery when it was near the repeater. It appeared to rejoin normally-- the flashing green indicated a child/parent connection) but for some reason (ultimately I suspect a mismatch somehow resurfaces when the hub performs the MAC-shortID correspondence) the hub failed to communicate with it (continuously issuing route discovery broadcasts for the V2's new shortID). Communication with the hub was restored when I moved it within direct range and it became a child device of the hub (out of range of the router it had just joined).

Because of range issues with my sniffer, I couldn't trace its communication when the V2 was direct connected to the GE plug; that might have revealed something interesting.

I suspect that the bogus shortID's that appear in the hub's child table after migration (and never change, even after a device is 'removed' via the details page and rejoined) yet still continue to be associated with specific MAC addresses might be at the root of the issues.

4 Likes

From my toddler-equivalent understanding of what's under this whole hood, I think you are spot on!

Thanks again for these inputs -- hopefully all your analyses in this thread will move the needle toward a fix.

2 Likes