C8 Zigbee Radio Turning Off/On Multiple Times a Day

Two other C8 running same make/model devices? Any variations make the other two hubs not the same.

1 Like

I've gone through quite a long process with mine. I dont want to hijack the post but I will share my experience if it helps to evolve the thinking on this and perhaps people might be able to offer some suggestions.

I started with a C7 initially. It was my first experience with Hubitat so i made some mistakes and did things in ways in which with hindsight were bad decisions. Eventually I ended up with:

  • A large number of Sonoff battery powered devices for motion, door and temp
  • Some Sonoff Minis for controlling devices
  • Innr Smart Plugs
  • MOES Power Monitoring Smart Plugs (now not used)
  • Philips Hue motion/temp/lux
  • Google SDM API connection for Doorbell and Security camera
  • Some Candeo and Zemismart light switches
  • Google Home integration
  • Chromecast integrations through the house
  • Sonos integration (for more reliable announcements)
  • An aqara water sensor for rain detection
  • Hive Central Heating integration
  • Radiators controlled by underfloor heating wired valves (activated by smart plugs)
  • Samsung TV Remote
  • Denon AVR Integration
  • Mobile Phone App/Device
  • Fibaro Smoke alarm (Z-Wave)

I started to become unhappy with the reliability and realised I was getting frequent Zigbee reboots, at a similar time I started to learn that some zigbee devices could be unfriendly and spam the radio and that perhaps the channel it was using could be a problem. I also had a garden office constructed and needed to extend the distance of my set up to the far end of the garden. So I bought a new C8, Started to build up the devices in the house onto that one and moved the original C7 to the office and meshed the devices.

I then started to build back all the devices in the house on the new C8 but soon noticed it was having problems with the zigbee off/on. I messed about with it for ages and at this point removed the MOES power monitoring plugs. I was getting quite frustrated with it by this point and decided that in order to truly get to the bottom of it I would get another C8, run it along side it and then more delicately pass devices/functionality back and forth until I can determine the root cause.

The third C8 runs perfectly (as does the C7 that is now in the garden office) but the C8 still resets the zigbee. I am still in the process of working out what is going on but the only devices it currently uses are Sonoff motion sensors and Mini ZBLs and the Sonoff Door Sensors, all using Hubitat drivers.

My wife has been fairly patient but I need to get to the bottom of this and would love to have some more tips (or functionality in Hubitat) to enable a quicker process for diagnosing it.

I'd like to believe that without this zigbee reset issue I could use just one Hub along with extenders (which did work) and not need the other two. But without another reference point its impossible to diagnose without leaving your home disabled and with large parts of functionality offline for periods of time.

1 Like

So, the MOES power monitoring plugs are not the reason for the C-8 zigbee on/off issues, as the problems continues even after you have removed the plugs from this hub?

Correct - the issue existed before and has continued since they were all removed (both from the network and from power)

This confirms my theory - the only thing in common between the C-8 Zigbee radio periodic switch off for 7 second problem, the Zigbee devices (no matter which make/model), and the Zigbee drivers (no matter system or custom) is solely the word 'Zigbee'.

Seems like when the hub is overloaded for a short period of time by any reason, the first thing that suffers is the Zigbee radio.

I would look at the RM5 rules and the LAN integrations running on the problematic hub.

1 Like

Similarly not wanting to hijack this thread, but this behaviour's cropped up elsewhere including my own ongoing thread covering off/ons on a C-7.

From that I'm testing whether eliminating unplugged devices and/or custom drivers helps (so far, the former's been discounted), but if @kkossev's theory does prove to be the underlying cause I'm not sure the off/on's can be reliably prevented. On my system (for example) none of the RM5 rules are triggered when the house is quiet, the only current LAN integration is a single Kasa WiFi outlet, so the overload's being caused by...multiple Zigbee devices sending in status reports at once? If so tweaking reporting settings or having fewer devices would at best reduce, rather than eliminate, the chances of a clash.

Thank you for your interest and input

Certainly reducing the number of devices overall has reduced the frequency of the issue - I was getting resets every 30 mins at one point, now they are several times a day with no pattern.

There are a number of Rules running on that hub - moving them over is on my to do list.

Throughout experiencing this issue I've kept a close eye on the load of the hub from both Apps and Devices (Hubitat doesnt make this easy) and its not possible (that I know of) to see what apps were running at a given moment in time when a reset occurs.

If I look at the App stats on the Hub in question and filter by % of busy then Hubitat Package Manager is the top Name in the list with 0.573%. Every other process thats running is less than this, the next one is Bathroom Lights in Room Lighting at 0.082%

I might be reading it wrong, but given the readings above it would indicate the Hub is not under any load. I cannot account for a spontaneous event occurring but I'm not sure what would create that given what I have the Hub doing.

I've removed or transferred any internet based activity relating to devices or apps other than Google Home (which shows as 0.001%) and this is a very heavily used component within the platform which I assumed would work with an acceptable reliability or it would have been flagged by others.

My next step is to move more devices (Sonoff) to the other hub and then if required move over Room Lighting functionality.

I am with you, so far no evidence Zigbee devices are directly involved. however when this happens none of the available performance metrics or logs show any overload. there are no i/o metrics to look at.

I have been using HE for over 5 years and only recently (this year) have I had Zigbee radio issues. Not sure if the logging of the radio restarts is new to releases of 2023 but I don’t recall ever having issues before and I have never had devices randomly drop prior to this year.

I own a C8 but have delayed deployment due to issues mentioned in this community and I am still using C7 hubs. I have 4 production hubs broken up by protocol (LAN, Zigbee, Zwave, and coordinator with all devices via hub mesh). My radio hubs have zero apps/rules running other than Hub Mesh to share the devices with my coordinator hub.

I log hub stats to InfluxDB and there are no spikes in memory or CPU at the time of the Zigbee restarts. Bobby looked at my engineering logs and found nothing either and suggested it is likely device related. I setup NodeRed to listen to the Zigbee logs web socket to log everything to a MariaDB database in hopes to find something at the time of the radio restarts and again nothing. Analysis of these logs showed that my 2 Zigbee environmental sensors made more frequent updates than most other devices so I replaced them with different sensors and removed them. Zigbee radio restarts still occurred after that.

I haven’t had as many restarts in recent weeks since the latest firmware updates and restarts required to deploy them. So that may have something to do with why they haven’t been happening. My only theory is could Hub Mesh be causing the “load” and problem? Again this is the only app running on my Zigbee hub. In Jan of this year @gopher.ny removed UDP as an option for hub mesh and now it is TCP only. Could that change be the cause?

I am curious if others experiencing Zigbee radio restarts also have multiple hubs utilizing hub mesh.

I've said before, I have no problems with no powered Zigbee devices in the mesh. This is with only one lan integration, Envisilink, and the rest, Z-Wave.

I blamed the reboots on Centralite plugs. Plus, they occasionally lost the ability to be controlled, where they had to have power interrupted to them to get working again. Nothing else would work.

As another experiment, today I'll plug in a bunch of the latest Sengled plugs around the house with no loads and no rules; let them run and see what happens. Power reporting will be disabled as well. This is with one C-8 hub running latest platform version.

And this time, I won't look at the chart, lol.

A little provisional, but similar here - I've had my C-7 for a bit over 3 years, and it's only this year that I noticed these dropouts happening (in my case it was the occasional lack of response that first caught my attention, which led to checking the logs and finding the radio restarts). Very likely would have noticed them if they'd been happening previously.

FWIW I only have the 1 hub - no mesh.

2 Likes

Although I use Hub Mesh now, my issues began on a single C7 initially

1 Like

Might be useful to summarize what's been discussed earlier in this now almost six month old thread (it started out as C-8 specific, and what follows is also but might pertain to C-7 as well).

What we know:

  • Logging of "Zigbee radio offline/online" for 8 to 9 seconds corresponds to time required for the EFR32MG21 NCP (C-8's SiLabs network coprocessor) to reboot and reinitialize "ASH"

  • "ASH" refers to SiLab's proprietary UART (serial) protocol used (per SiLabs) to "reliably carry commands and responses between a host processor and a network co-processor (NCP)"; ultimately it's how the hub's firmware interacts with the stack running on the EFR32MG21 which runs the Zigbee mesh.

  • Per Mike's previous post earlier in this same thread (see C8 Zigbee Radio Turning Off/On Multiple Times a Day - #114 by mike.maxwell), the NCP reboots happen when the hub detects 'ASH error 6': per SiLab's Silicon Labs Community , error 0x06 indicates "Reset: Assert"-- the NCP detected a failed assertion (internal consistency check) which has caused an abnormal internal reset of the NCP. The NCP is now in a 'failed state' due to the internal reset

  • Per SiLabs (see table 6.1 in previously linked post in Silicon Labs Community ) the NCP will respond to all subsequent communication from the C-8's host processor with an "ERROR" frame until the host resets it to reinitialze the ASH protocol.

What we don't know:

  • that this has anything to do with how busy the C-8 host processor is (running LAN integrations, processing rules or pistons or whatever); or even how busy the NCP is processing messages generated by mesh devices. Per that same Silicon Labs Community post, the "'Reset: Assert' error code 6 "can have many reasons, and most probably requires a deep investigation"

@velvetfoot 's experiment-- which if I understand correctly involved merely adding/removing redundant routers to his mesh (without changing automations or end devices) and thereby produced/eliminated the logged Zigbee reboots-- seems to support the notion that this phenomenon isn't caused by processing automations on the hub, or even the presence of Zigbee end device traffic.

Rather it would seem to correlate with whatever additional host-NCP traffic (ASH protocol traffic) results from the presence of additional routers in the mesh. Even when the hub isn't issuing any Zigbee messages (from automations which would directly address those added routers), their mere presence in the mesh requires that they constantly exchange link status frames (not only among themselves, but also with the C-8 if they are in its neighbor table). Frankly I don't see how, but maybe the presence of that low level link status traffic might correlate with the frequency of the abnormal internal NCP resets that 'ASH error 6' is complaining about.

Hopefully there is still work underway on Hubitat's end to get to the bottom of this; SiLab's response to a user facing this error suggests opening a support ticket with them. Evidently (referring to the imbedded text in that post) the NCP is capable of returning additional info about numerous reset fault causes (appended with code 0x80) in addition to just 'error 6'.

9 Likes

My C7 got into a state where I could 100% reproduce the zigbee reboot by triggering a backup manually. What could explain it ?

1 Like

Interesting datapoint; I might have missed it but I don't think I seen that mentioned elsewhere... and a good question which then leads to another: does the C-8/C-7's host processor interact with the NCP during a database backup? Pretty sure that the C-8 is capable of reading the NCP's current network key (I don't think earlier hubs can do this).

Not sure what other interactions there might be though it seems likely that both host and NCP would maintain tables of MAC addresses and their corresponding short addresses that would need to be synchronized in a backup.

3 Likes

I speculated in another thread as to whether the host processor might be starving the NCP in a way that leads to the stack going into an invalid state, HE rebooting it to recover it. The database backup might be a red herring (it may kick off a GC pass for example). This is of course wild speculation on my part, I am completely unfamiliar with HE's system design and have not had the time to look at SI's reference designs.

[EDIT] A cloud backup on the C7 does "back up the zigbee radio" as per documentation. Would that involve a host-NCP interaction? Incidentally, I do not recall having zigbee reboots prior to becoming a subscriber.

3 Likes

Ok, I just sprinkled 9 Sengled outlets around the house. The clock is ticking, lol.

1 Like

Yes you're correct; I misspoke. Backup would have to be able to read the C-7's network key and include it in the backup or migration to a C-8 woudn't be possible.

4 Likes

Great summary, @Tony, thanks for all the effort on that to bring the data together.

I have been having Zigbee issues on the C8 from a month or two after it was released (I purchased on the first day it was available). I'm on my third C8.

  • I have been using Hub Mesh the entire time on my C8 and C7, only sharing from C7 to C8, not vice-versa. When my Zigbee radio reboots started I was only using Hub Mesh for Z-wave devices shared from C7 to C8, no Zigbee devices shared, and nothing on C8 sharing to C7. In
  • I had a successful migration from C7 to c8 when I when received my C8 - no errors or othe issues. Aside from commonly reported issues w/Hue motion sensors which I also encountered, things w/Zigbeen were relatively calm for maybe the month os so w/the C8. However, over time I started having devices dropping off regularly and repeatedly.
  • As others have reported, it has been common for my Zigbee radio reboots to go unnoticed unless I was using an automation w/a Zigbee device at the time of the reboot. There was no consistent pattern of reboots and Zigbee devices falling off my hub. They could occur together or separately.
  • I have only had Zigbee issues on the C8, I've never seen a reboot on the C7 or had any issues w/devices there. (Caveat: I'm not watching the C7 as closely as the C8.), and have never had a device fall off a C7 (either my current C7 which was my "secondary" hub, or my original C7 which was my primary hub that I migrated to the C8.
  • C8 is recently (like in the last couple weeks) down to 96 devices from between 115 and 120. Reduction due to 1) Moving devices that fall off the C8, to the C7, and 2) Removing some "test" devices that I had on my C8 but wasn't using.
  • C7 now has 35 Zigbee devices, and about 2/3 of them are shared to the C8 and used in automations that run on the C8.
  • I am no longer sharing any Z-Wave devices from the C7 to the C8, I've moved them all from the C7 to the C8 as the C8 is faster/more reliable w/Z-wave than the C7 was.
  • I've had three C8 hubs, original and two replacements. First two behaved similarly, third has been better but (at least up until very recently) not perfect (like my C7 used to be w/Zigbee).
  • I have a fair number of repeaters, which I find more interesting after reading your comments about "additional routers on the mesh." I currently have various types on the C8: Three SonOff USB Dongles, nine Iris/Centralite plugs, Four Ikea plugs, Four Sengled plugs, and two Innr plugs.

My overall symptoms have been:

  • Periodic Zigbee reboots with and w/out devices falling off the hub. However it's been quite a while, probably a month or more, since I've had a Zigbee radio reboot.
  • Devices falling off my mesh, interestingly to me, predominantly formerly rock-solid Samsung/Centralite (leak sensors) and Iris motion and Visonic contact devices that I had never experienced a sigle issue with in prior years of use.
  • Devices not being able to be re-paired to the C8 (but easily pair to the C7). Again, particularly Iris motion and Visonic contact devices. The leak sensors typically would re-pair again. An "initial" pairing of these same devices (e.g., getting an old one out of the box) would work reliably on the C8.

Things are on an upswing since I started moving devices that fall off from my C8 to my C7, and things have been stable for almost two weeks now. I did have one incident where I lost several Aqara devices repeatedly, but that appears to have been caused by a mmWave motion sensor...once I moved it to my C7 I stabilized again.

5 Likes

I had a C5 for 4 or 5 years before I picked up a C8 one month after release. I was excited about the external antennas, and wanted to support Hubitat’s growth.

My C5 was rock solid for months before I migrated to the C8. But if I knew then what I know now I wouldn’t have made the switch. I would have rather just donated the cash for a C8 to Hubitat, because I want them to stick around, and then stayed on my C5.

Because of a busy work schedule I don’t have the time to manually migrate back to the C5. Occasionally the lights or switches in our house don’t respond and WAF has taken a pretty permanent hit. We’ve gotten used to having to manually turn on/off switches when things either drop off the mesh or the radio goes offline.

My logs show zigbee radio offline at least once every few days. I’m still on 2.3.5.152. I’ll update to the latest firmware in the coming days if I have a break in work. I want to be a ailable to troubleshoot if things get worse.

I’ve tried all sorts of remedies since last winter. Changing power levels from 4-20 and waiting 24-48 hours between changes. Shutdowns with power cord being pulled for 1-2 hours to see if the mesh needs to heal. Different channels. With channel 20 and power 12 I seem to get the ‘best’ performance but still not as good as my trusty C5 was.

I hope that eventually a C9 comes out that resolves all the zigbee issues that I can migrate to someday. Still love Hubitat but the C8 helped me see how spoiled I was with the C5.

2 Likes