[2.3.5.152-2.3.6.145] [C7] Help! Hub seems to be failing with repeat zigbee radio online/offline

Oy. I bought 12 of those on your recommendation. :slight_smile:
On the other hand, a while ago I bought a bunch of the Centralites on someone else'e recommendation.

1 Like

Thanks. So the radio may be taken down to preserve resources, but this doesn't sound like its the first thing to go, at least not anymore.

It could be the first thing to go if you only see red in your logs with errors occurring every few ms.

1 Like

The manual said cumulative power readings were available. I assume this means kw-hrs. It doesn't seem the generic driver support energy. Not that I have power turned on, but it would be nice.

As an aside, I just looked at the manual for reset procedure for one of the 12 that had acted up before and was acting up again just now.
I found that pushing the button while plugging it in helped.

It supports watts

Zigbee radio reboots started again a few days ago, two weeks after the last reboot, just like before.
So far they are not yet repeating frequently as they did before. Memory has been holding steady at around 185-190 MB the past few days. Nothing special about CPU or other indicators I know of.

However the interesting thing is that I can currently generate a zigbee radio reboot "at will" simply by triggering a cloud backup (manual backup doesn't appear to trigger it reliably). The last two I just triggered as I was writing this reply. Only one of the reboots in this list seems unrelated to backups, at 13:54 (I happened to be using the hub's web interface via remote admin in the mobile app at that time).
image

Here are the (annotated) backup times for comparison:
image

Somehow backups are triggering radio reboots, and cloud backups trigger them more reliably. There may be other underlying conditions I suppose, like a bad device, but I have yet to find any evidence of that.

Hope that helps @bobbyD @gopher.ny

1 Like

The manual seems to indicate kw-hrs are also possible, at least with the Sengled hub:

image

Can you create a warranty case and also send me a private message with your hub id?

1 Like

Since my last post:

  • I removed a bunch of custom apps, completely reviewed the polling intervals for Kasa devices, double-checked every other cloud/LAN integrations, and started moving a few devices to the second hub,
  • Hub ran 2.3.5.135 for 7 days then had two radio reboots in one day. I removed the two Aqara in-wall switches, updated to 2.3.6 and it than for 14 days before I saw a zigbee offline message.

The zigbee radio restarted yesterday but this time it was different - the radio stayed offline for 2 minutes and 20 seconds @bobbyD is this indicative of a different problem?

The timing seems random (backups are scheduled for 3:01am). There is absolutely nothing in the logs in terms of errors. Just after the radio recovered :

Is the IKEA Tradfri driver "not supported" or known to cause problems by any chance? (@kkossev do you know?) It is using the system IKEA Tradfri Control Outlet driver. I will move it to the other hub just to be on the safe side.

image

See this post :

1 Like

I know one buggy Zigbee device that sends bursts of 20-30 duplicated temperature reports at once, separated by just a few milliseconds between the repetitiveZigbee reports. It's been sold with this firmware bug for years without being fixed by the manufacturer.

Hubitat hubs (and all other Zigbee coordinators) handle this misbehavior with ease. No 'Zigbee radio is offline' issue, even after such Zigbee reports bombarding that last several seconds.

2 Likes

Thanks. I should have worded my question better. What I meant to ask : should this 30W driver considered to be part of the compatible device list? or is it a totally different device from the IKEA Zigbee Control Outlet this system driver is for?

I was told Hubitat will not consider investigating this problem as either a hardware or platform software issue unless I first remove any "not on the compatible list" Zigbee devices. I am trying to comply with that request while not totally breaking my home and FAF by moving any such devices, one by one, to an alternate hub I bought. I have started with wired devices as it seems improbable that sleepy end devices could take a coordinator down...

Makes sense to me, but unfortunately Hubitat support doesn't seem to agree.

IKEA is in the 'Brands that work with Hubitat' list:

... and it is highly unlikely that the IKEA Tradfri LED Driver will be listed as compatible with 6 other Home Automation systems but will have problems with Hubitat.

1 Like

I've just had another longer zigbee offline period, this time nearly three minutes.

Can't help but notice that, this time as well, the first logs after the radio recovers are route info logging coming from the Tradfri 30w driver device. No such route info appears in the device's logs in between the two zigbee radio offline events.

@mike.maxwell any insight into whether turning off "Enable route and LQI logging" on the Tradfri Outlet system driver might have an impact ?

In the meantime I have removed this device from the ailing hub and meshed it in from the (new) secondary hub instead.

I have Ikea Tradfri outlets on my C8 and C7, and do not have route and LQI logging enabled, and haven't seen any issues on either hub with them.

1 Like

Still tracking this issue on 2.3.6.145. Hub has been up for 8 days. First zigbee radio reboot since last boot. Timing lines up with scheduled daily cloud backup (@gopher.ny would it be possible to get a locationEvent or optional logging for backup begin/end?)

Cloud backups are scheduled for 3:01 every day (overkill to do cloud every day, I know):
image

The zigbee reboot occured at 03:03:23

I got this message:
image

Usually the cloud backup shows up with a timestamp 1 second after the corresponding local backup, so would have been something like 03:03:21, within a couple of seconds of the recorded zigbee radio reboot :

Nothing seems worthy of attention in the zigbee logs (the message rate/pace seems normal), event logs, or app/device logs around that time, except for this strange entry in the zigbee logs at 03:30:30, first message past that zigbee reboot window... which might be from the radio coming back from reboot ?

{"name":"0000","id":0,"profileId":0,"clusterId":32818,"sourceEndpoint":0,"destinationEndpoint":0,"groupId":0,"sequence":217,"lastHopLqi":255,"lastHopRssi":0,"time":"2023-11-03 03:03:30.667","type":"zigbeeRx","deviceId":null}

Nothing attracts my attention in device stats (1.9% of total), app stats (5.3% of total), all automations appear functional. Free OS memory is perhaps a bit low (last backup dropped it from 190 to 170) but I've seen this hub survive for days below 100 MB so not really worried about that.

(EDIT: I moved the Tradfri driver off this hub a couple of weeks ago)

1 Like

Thought I'd add one last update to this thread in case it can help anyone still experiencing "Zigbee reboot" issues and looking for clues.

In my particular case at least, the issue turned out to be unrelated to Zigbee. Instead it appears to be a symptom. Of what ? My best guess remains that the platform becomes somehow overloaded and unable to service the NCP in a timely matter, leading to an NCP error state that requires a reboot to recover from (a theory never confirmed by HE, so take it with a grain of salt).

The very same C7 which had zigbee issues after just two weeks of uptime, starting with or shortly before 2.3.5.152, now has none. Most recently, it has been up on 2.3.7.144 for over six weeks and still going - no zigbee reboots. I consider the issue resolved.

Things that changed in the weeks leading up to the issues I had last summer:

  • Became a subscriber (which enabled cloud backups)
  • Enabled hub security
  • Relocated the hub 12ft across the room
  • Upgraded the wifi router (unintentional wifi channel change)

Things tried which did not seem to improve the situation

  • Remove zigbee devices that appeared to work fine but were not on the compatibility list. All back except for two Aqara mains-powered switches which were relocated to another building and not causing problems there, on a different C7. In any case zigbee reboots continued after removal from the original C7.
  • Remove a few community drivers that used significantly more resources than the built-in ones
  • Soft reset
  • Removed so-called "chatty" power reporting devices, such as the TR plugs (there are six now on this hub - no problem)

Things that did appear to improve the situation

  • fixed custom apps that enabling hub security had broken (silently - no errors in the logs)
  • fixed/reconfigured/removed custom apps that made frequent http calls, in particular to localhost:8080, e.g. for file manager tasks
  • fixed/reconfigured/removed custom apps that in certain circumstances could generate events in large-ish bursts (without triggering any alerts)
  • carefully reviewed poll intervals to Kasa devices

Nevertheless, I believe that memory and performance fixes that came in release 2.3.7.x. made a difference.

5 Likes

Did you solved the issue? I am having the same problem.

Yes.

You should start a new topic for your specific issue (happy to try and help you there). Also read through this post to see what information would be useful to provide, to start :

(@moderators can we close this topic please)

1 Like

This topic has been marked solved by the community and subsequently closed. For further discussion of related issues, feel free to create a new topic under an appropriate category.