Zigbee Radio - Intermittent Brief "Off"s

I'm being bold here - but when I struggled (and it was a mighty struggle with fingers pointing everywhere) I ultimately started tearing down - disabling devices and apps until I reached stability. In my case, and I don't recall exactly anymore it ended up being and app that caused it. there were no errors in logs or anything, it was disabling some of the apps and waiting 2-3 days to see if it worked that ultimately got me past this horrid time (I was a mess for nearly 2 months when c8 came out and I was certain HE was at fault - they weren't or if they were, some/all of the fixes they were putting in at that time (timing issues I think? some other stuff? have to go back and read the threads)

I can't recall proof for even one single case where a Zigbee or Z-wave driver was the reason for Zigbee radio-off issues. All these drivers are event driven, they process the incoming messages quickly and exit.

Likely it was for something LAN based. At least that's what I've found when that happens.

2 Likes

Is that LAN activity from the hub itself, or general LAN activity impacting on the hub?

FWIW I've got very little LAN-based on the hub; the Evohome app and devices mentioned above, and a few Kasa outlets using the built-in app and drivers. Both have been in place since last autumn.

"Just in case", early this afternoon I tried completely uninstalling the Evohome setup (we can get by without central heating management in August :grinning:).

After that, and later in the day (it's now late evening local time) I've checked the logs, and unhelpfully there've been no Zigbee offages at all since midnight yesterday so it's hard to be sure whether removing the Evohome's made a difference.

I'll continue to keep an eye on things.

Lan integrations. For instance in my case it was caused by my iotawatt being off and the integration kept trying to reach it with no errors in the logs. Once my iotawatt came back on line all was good. The zigbee radio is the 1st thing to shutdown on hub overload.

(Final?) Update on this - after a few weeks of experimentation (including a soft reset when we were away from the house for a few days and DB restoration on return, to try to verify it wasn't a hardware issue) the cause of the Zigbee off/ons has provisionally been pegged as this app/device driver for a Google Nest thermostat. Something I'd forgotten was in place[1], although FWIW these did seem to be working as they should in themselves.

Not 100.0% sure, as it wasn't practical to wait a few days to see whether new off/on log entries appeared after every individual change, but after uninstalling a set of "definitely not in use" apps and the Google one there've been no recurrences for 3 days.

[1] Realised it was thing after seeing "Received cloud request for App xxx that does not exist" after doing the soft reset, then checking which app xxx was after the restore.

1 Like

Just out of curiosity - I certainly won't try to claim that the Nest integration is fault-free - but a few questions and tips that may help if you still want/need to connect your thermostat and/or cameras ..

Were you running the latest code? There was a nasty bug a little while back which could bring the hub to a crawl, because of a rogue log statement which tried to log an mp4 clip :upside_down_face:

If on the latest version, I'd make the following suggestions if you want to try re-adding it:

  • Leave debug logs off for the App (they can get verbose, and maybe there's another odd one hiding)
  • Turn off device settings for image captures -perhaps the download of this data was slowing things just enough?
  • Likewise turn off Google Drive in App settings - the re-upload to Google again may have incurred some slowdown?

Then you could re-enable these portions 1-by-1, if desired, to see if it re-introduces the problem.

1 Like

Interesting. Whenever I do a soft reset, I immediately do a restore. I've never checked the logs.

I think so - I had 1.0.8, which according to the importURL is the latest.

As it's turned out Nest handling is no longer needed for our Hubitat setup - it was very useful (and thanks for making it available! :1st_place_medal:) in our previous home, but in our current one we've switched to Evohome. It was continuing to monitor a second Nest used by a family member (so didn't have any null/stale device entries), but that was more an oversight than a requirement and once I'd realised it was still active the most practical option was to remove it completely rather than attempt more detailed troubleshooting.

Just when you thought it was safe to go back in the Hub Events list...

This is more for people's info than anything else, and to apologise to @dkilgore90 for unjustly suspecting his driver.

After several days of no Zigbee radio issues I put the Evohome app/driver combo back on...and the off/on's returned. So I removed the Evohome items...and the off/on's continued. :confused:

So, there's a lot of speculation in this, but possibly my hub was teetering on the edge of being overloaded and near enough anything extra added will start it misbehaving? Still no idea of why it might have a high load - near enough all my apps have triggers from specific devices (mainly motion sensors, some contact ones) rather than running periodically in the background, and some of the off/on's happen in the early hours of the morning (3-4am) when there'll be next to no triggers anyway, but the logs are what they are. :man_shrugging:

I'll continue to scratch my head over this, run what tests occur to me, and keep people updated on any progress,

Well, an update, but not a very useful one - after trying/checking various things:

  • The off/on's happening don't seem to be correlated with general hub activity levels. There can be stretches of several days of regular house activity when none occur, and also half a dozen off/on's over an hour at a time when there's nothing to speak of going on (early hours of the morning, house empty in the day, etc...).

  • Looking at logs there's no correlation between off/on's and any other event occurring (for a device or device type) that I can see - the off/on's can happen well clear of any other event being listed.

  • Just to reiterate something covered above, the only internet-related app I have running at the moment is the stock Kasa one, controlling 3 outlets (only 1 of which is plugged in right now), and power monitoring is turned off for everything that supports it.

Off's can happen while normal activity's taking place (determined by checking logs after the event), and the occasion failure of motion sensors, voice commands, etc... to control lights is impacting on the WAF.

After a bit of searching I've seen people with C-7s and C-8s reporting this in other threads, and while there's user speculation on possible (probable?) causes AFAICS there's no recognised solution or official position on it.

With the WAF in particular in mind, can I ask @bobbyD where things stand with this behaviour at Hubitat? Are there any recommended fixes, or failing that any further tests I could run to try to identify the cause? Very happy to supply log details/DB backups/anything else if that would help.

Did you ever remove the Zemismart 3-way wall switch?

2 Likes

What polling interval are you using for each device? Are the plugs that are not plugged in still referenced by the integration ?

2 Likes

The problem is with what you don't see. Zigbee going offline is likely caused by the traffic between one or more devices (likely simultaneous events) and the radio. This traffic is not captured by the logs. Your best approach to resolve this problem is to identify the problematic devices. Generally, you are safer if you use both built-in drivers and devices that are on the Compatible List. For custom drivers or devices not tested by our engineers, you can use the process of elimination to narrow down the problematic devices, or a Zigbee sniffer. I personally don't go the sniffing route, and I don't recommend it, as that is too time consuming, instead, I carefully weed out potential bad players from the mesh until I am able to stabilize the mesh.

3 Likes

30 minutes (the default) for all 3 devices - if I've understood the question correctly, all 3 are still registered/listed in the Kasa app. The unplugged outlets are referenced in a few rules, but those rules are disabled (they're for Christmasy things).

I have to admit I didn't - from other responses at the time I got the impression it was unlikely to be a driver/single device issue, and that plus the inconvenience of removing automation for all lights in the kitchen (see also WAF) meant I shifted focus to other areas.

My better half might be away for a few days in the near future, giving me more latitude than usual to mess with systems - will give the removal a go if that happens.

Thanks for the advice - I've got a dozen or so Sonoff devices (mainly buttons and motion sensors) I'm currently using custom drivers for, so in the first instance will try switching all of them to built-in ones.

For others only custom drivers are available - if this is an answerable question, for test purposes would setting those to device type "Device" serve to take them off the mesh and prevent them potentially messing up the radio? Or would they need to be Removed from the Devices list? (Or would something more extreme be needed? Put the battery-powered ones in a metal box, then bury the box?)

1 Like

If you actually want to de-power, is there a tab you can pull on the 3-way? Is called an air gap? So you don't have to de-power the circuit.

Maybe you could "take out" the 3-way, without actually taking it out, and temporarily change rules as a workaround?

From memory, and after revisiting the product page on Amazon (see below), I don't think there is an air gap. To be sure I'll check on the 3-way itself if I do start poking about, but I'm pretty sure that the wiring terminals are the only things available to work with besides the front buttons themselves.

If the worst comes to the worst I'll just disconnect the live wire, disable rules that might try to control the 3-way, and see what difference that makes.

41S4OLfqd3L.AC

i wonder if the driver will continue to try to poll those every 30 min (admittedly not super frequent), leading to (http?) call failures. Been chasing those down after @rlithgow1 hinted at it. Might be grasping at straws in this case tho :man_shrugging:t2:

1 Like