C8 Zigbee Radio Turning Off/On Multiple Times a Day

Been running one of my c8’s for 3 weeks solid up until I put the latest upgrade on a day or so ago.
All zwave and all zigbee stable. No radio drop outs and all devices reporting in.
I had a load of problems pre .125 but everything is as solid as a rock now.
I have been patient and added one device at a time and also let the meshes settle over night before I added another batch. One at a time.
I had issues with a zigbee Sonoff push button not switching off some dimmer modules. But when I checked the button was registering.
I had to power cycle the two zwave modules doing the dimming. Nothing to do with zigbee.
Not sure why but my system is solid as a rock at the moment.

5 Likes

Fingers crossed, but it’s now been over a week (since installed .130 release) and no radio resets. I had been getting one to five per day prior.

3 Likes

I don’t think it has to do with the C8, I am still on the C7, and I started getting zigbee radio offline (and not turning on until I reboot the hub) since early March, I think it is something they changed in the code for the C8, but it is a problem in the code. Fortunately, it only happened to me 3 times in a month, but all 3 times, it took me many hours to realize that my rules were broken, which is annoying at best, risky at worst.

I have now set up a rule that automatically reboots my hub when a Zigbee offline happens.

2 Likes

How do you test that rule?
Turning off zigbee radio didn't work in my rule.

Wait until the next zigbee radio crash

1 Like

Here's hoping it'll be a while!

Also, my rule doesn't kick off with the 8 second blips.

I don’t have 8 second blips. I have Zigbee off and gone

I think I'd rather have the 8 second blips and a non-working rule trigger.

2 Likes

Me too

1 Like

Maybe send @gopher.ny a PM w/your hub's UID and he can check at some point to see if there is anything in your engineering logs that might explain the issue. He's neck-deep in C8 Zigbee radio stuff right now for obvious reasons, but hopefully he can find time to look at this in the near future.

3 Likes

Wild speculation follows:

SiLab's Ember Zigbee stack documents many status codes; however not a single one happens to include the words 'online' or 'offline' in their definition (at least that I could find here Status Codes - v6.3 - Zigbee em35x API Documentation Silicon Labs )

Only a handful of return codes even include 'radio' in their description (among them EMBER_PHY_TX_BUSY, EMBER_PHY_OSCILLATOR_CHECK_FAILED, EMBER_MAC_RADIO_NETWORK_SWITCH_FAILED). They seem appropriate to describe broken hardware, not transient status.

Given that the C-8's firmware would likely be using one of the documented return codes to surface a Zigbee 'online/offline' status, it seems probable that 'Zigbee radio is online' would be related to one of the following: EMBER_NETWORK_BUSY, EMBER_PHY_TX_BUSY, EMBER_NETWORK_DOWN (there's also an EMBER_NETWORK_UP to fit nicely with 'Zigbee radio is online').

That said, the (roughly) 8-second 'offline' period seems to be a consistent interval being logged by setups afflicted with this issue; I wonder if it is somehow related to a scenario described (a long time ago) in a Silicon Labs Community post that I stumbled upon:

(Quoted post follows:)

In my experimentation in the use of multicast (i.e., to a group of commissioned [router] devices) I am experiencing EMBER_NETWORK_BUSY. A read of the Knowledge Base reveals that I'm likely bumping up against the ZigBee limitation of a maximum of 8 broadcasts per 9 seconds.

The poster goes on to quote an SiLab's document:

From "How are different tables managed in EmberZNet?"

The broadcast table is used because the only specific limit that ZigBee (and therefore our stack) places on sending of messages is in regards to broadcasts. The ZigBee Networking specification requires that in-progress broadcasts (at the Network layer) be tracked by each router (via a broadcast table) to prevent repeating duplicates of an already-circulating broadcast. Since the space in the table is constrained (by RAM), only a limited number of entries exist; and since the duration of each broadcast must be considered in the worst-case scenario (large networks where broadcasts may take many seconds to circulate through the network), limits are placed on how many broadcasts can be sent in a given timeframe. For ZigBee Pro, this limit is effectively 8 broadcasts over any 9-second period.

So if the application tries to queue an APS Broadcast or an APS Multicast (which relies on the broadcast mechanism at the NWK layer) after this broadcast table is full (due to other NWK layer broadcast activity created by APS Broadcasts, APS Multicasts, Route Discoveries, Address Discoveries, Device Announcements, etc.), the stack will refuse to queue the message and instead will return EMBER_NETWORK_BUSY status. The application must then wait until the broadcast table has room to track a new broadcast. (How long to wait depends on how quickly the table filled up relative to the 9-second table entry timeout.)

Granted, the broadcast window limitation discussed in that post is 9 seconds (vs. the commonly seen 8 second 'Zigbee radio is offline' periods), but I wonder if some other stack limitation is at play here....

5 Likes

[Pretending that he understood above...] :slight_smile:

Does any of above also lead you to a potential "aha!" regarding why this is such an intermittent issue - some (most?) folks never seeing it and stable Zigbee, others having severe issues.

Mine were exactly 10 seconds off and on when I was having them.

We get ASH error 6, which has nothing to do with broadcast messaging. 8 seconds is the average time it takes us to detect zigbee is down and then get it restarted.

5 Likes

Thanks... makes sense that a restart time would produce the consistent intervals being logged.. now on to more googling LOL
Why does my NCP often return ASH Reset: Assert error code? (silabs.com)

4 Likes

Is there a reason why the zigbee radio doesn’t restart itself with the C7 ?

C8 has a newer Zigbee chip and the ability to hard reset that chip from the software. C7 does not.

So, the "ASH error 6" is what actually causes the short reboots? And, sorry if this has already been discussed, what is the nature of this error?

NOS..., not otherwise specified.

1 Like

:rofl: :rofl: :rofl: