[2.3.5.152-2.3.6.145] [C7] Help! Hub seems to be failing with repeat zigbee radio online/offline

That would most certainly work.

1 Like

Would likely at most take a few weeks. But have you even checked the logs and event logs to see what's going on around those times when Zigbee goes out?

It takes a few weeks to confirm if the error is gone or not, and going at random, I would need to repeat the process a few dozen times with core devices (like thermostats, water heater, etc). I can’t go for a few weeks without those things, let alone repeating it a few dozen times.

I did check the logs. Nothing informative to identify a device. Only non specific queue full messages with nothing particular going on before the issue that I can see.

If I can automate a shutdown, I could. Does the hub automatically come back on when shut down programmatically if there is a power cycle ? I could flip a switch to HomeKit, then trigger a shutdown, and in HomeKit, program a power cycle 1 minute or so later (does a shutdown require more than 1 minute )

1 Like

Probably could pinpoint with an xbee stick then. Simply scan until it crashes. Also what was your memory at the time of the crash. Could have that happen. Any low memory issues or lan integrations can cause overload causing the zigbee radio to shutdown to save the hub

1 Like


This was for the latest crash. I only started collecting hub metrics recently to try and diagnose this. If something caused a high memory usage, it was very sudden, there was no progressive increase.

Oh, I do have the previous crash too.

Acknowledged. Looks like the only devices I am using which are not on that list are the Aqara switches and sensors, the ZemiSmart (actually Hiladuo) blinds and a couple of (fairly expensive) Stelpro heaters that use a Ki thermostat. If there is a way for you to let me know if any of these (beyond the Aqara) are known to be an issue, I would be grateful.

What about drivers? Is it possible for a community driver to cause problems with a compatible device without any signs of it visible in the hub logs/device stats or engineering logs?

I have automated the graceful shutdown and power cycling of my C-8 with my spare C-7. If Z-Wave crashes or a Zigbee outage lasts longer than, I'm not sure, 20 seconds, it does its thing, sending notifications and the rest of it. A Zen16 is in the mix as well. It hasn't fired off yet.

I just looked a bit through RM, and I can’t find a way to initiate a clean hub shutdown from a rule. Is that possible to do ?

Yes, if we had any suspicion after reviewing your logs that you might be dealing with a hardware malfunction, this topic wouldn't exist. But in the absence of any clues in logs, the next best thing is to eliminate any devices that could possibly cause the problems. If that doesn't resolve the problem, then we know it is the hub.

1 Like

That would be a bad idea. Having Zigbee going offline is indicative of a critical issue with the radio. It could be a hardware malfunction or it could be a device. It shouldn't take 2 years to diagnose. If you removed the non tested devices, and the issue persists, then it's likely the hub, and you'd need a replacement.

Anything is possible but usually driver issues appear in the Logs, so you could identify any problems easier. The issue is with devices that communicate with the radio and the traffic is not captured by the driver. Neither the Logs or engineering log would show that traffic.

1 Like

I understand where you are coming from as a company, you can’t vet community stuff. But as an end user, Hubitat gets at least 50% of its value from community apps and drivers. Removing all “non-tested” devices means ripping out the heart of my smart home. It is all of my thermostats, all of the major power switches, about half of the apps for lan or cloud based integrations, all of which have been online and working fine until the C8 firmwares started being pushed.

Even if I do do that and it “fixes” the issue, it won’t tell me which one of the devices is “misbehaving”. It clears you as a company, but it doesn’t help me as a user because I am left without a functioning smart home, in a situation which is much worse than shutting down and power cycling the hub every few weeks.

2 Likes

We are not really talking about community here, we are talking about devices that couldn't make it on the compatible list because we felt like other users would have the same poor experience. The community drivers and integrations are what makes Hubitat a great platform. Is the manufacturers that don't follow the standard.

As for lan integrations, those rarely affect the Zigbee, but when they do, they show it in the Logs, so users can make the decision if they value one integration more than a functional radio.

1 Like

That is useful. If community drivers and apps is not the issue, that shorten the lists a lot. I have quite a few devices that are on the list of compatible devices (Sinope), but for which I use community drivers for example.

Can you publish such a list of problematic devices ? Because you can’t test everything obviously, so what is not on the compatible list is either “untested” or “problematic”. It would be useful to have a list of “problematic” devices, that would at least give a starting point to start from those. Or maybe you can’t publish a list of problematic devices for legal reasons, but maybe you can publish a list of tested devices, and one of compatible devices (implicitly, those tested but not compatible are problematic)

Could an Aqara temperature sensor that dropped from the network and that I removed from the hub still be trashing the hub ?
I have one other Aqara device, a motion sensor, which works fine, but is not on the list.

Beside that, I only have Stelpro thermostats which you have not tested, I think (falls in “untested” category, not necessarily problematic), but which have been working fine for 6 months. I guess I could disconnect them temporarily before the winter.

Reading between the lines, unless it's a hardware malfunction, it sounds like even HE doesn't have visibility as to what or which device is causing the zigbee stack to crash.

Could a compatible but defective device also be at play ?

Thank you for all the helpful clarifications !

And that is why we don't discourage users to try devices. Aqara would be at the top of the list, but yet so many enjoy their products with Hubitat Elevation and have no problems. Building a mesh is so much influenced by the environmental factors, that what works for me, may not work for you. It is really up to our users how much time and effort they want to put into maintenance. Sticking to reputable manufactures saves you time and effort. Venturing outside the compatible list increases the responsibility one has, to make sure the device doesn't create havoc in the existing system.

Sure, anything that has a chip in it can go haywire. The same process of elimination applys to discover the root cause. Now, being on the compatible list means that we have at least one device in our testing environment, which could help us replicate the problem. Also, our engineers are known to ask for the particular misbehaving device to be shipped to them, so they can learn what went wrong in order to prevent the same problem in the future.

I want to thank everyone who took the time to chip in to help, especially @bobbyd. My next steps are clear and I will report back in a couple of weeks.

Based on the information shared in this topic, my current understanding is:

  • Radio reboots are caused by either hub hardware malfunction or zigbee device malfunction
  • in case of a radio reboot issue, devices not on the compatibility list should be removed to confirm they are not causing the issue.
  • apps are unlikely to be a factor, but when they are, there should be traces of an issue in logs, stats etc. (drivers are just a special kind of app)

Some speculation/educated guesses:

  • reboots are due to the hub failing to get timely responses from the zigbee stack (crashed, caught in a loop, whatever) and rebooting it is the hub's attempt to restore communication with the zigbee stack.
  • the hub can't expose more details about what is causing the zigbee stack to fail (e.g. those details may not even be exposed by the zigbee stack to the hub).

I want to answer this question:

It dawned on me when the Iris V1 devices had to be dropped from the C8's compatible device list. As I have stated elsewhere in this forum, the HE hub is really good value. It is in fact dirt cheap - I've paid almost as much money for a Homepod Mini or even a thermostat. I have had this C7 successfully control devices not on the compatibility list for over two years. Those devices together would cost 8-10x the cost of the hub to replace (to illustrate, among them, two convectors with built-in thermostats - a stupid mistake in retrospect - that cost $400 each).

I can use the extra C7 to rule out the hub as a cause of my issue. Regardless, I can live with a C7 running 2.3.4 or even an earlier release forever to control just these few devices if I that's what it takes to save me the cost and hassle of replacing them before their end of life. Hopefully I can continue to mesh that C7 with a newer hub / a hub running newer releases (or use MakerAPI if somehow hub mesh backward compatibility breaks) to keep automations alive over the long run.

1 Like