At 9:30a and 5p I run a 'diagnostic' on my environment. I have a group called 'Rare devices' which is items that often don't get touched for days on end. I've a rule that simply turns them all on, then turns them all off and device watchdog is triggered for a report. This is all automated.
Additionally, I also have a nearly weekly manual process. The process is a manual backup, then a shutdown. Then I pull the power from the hub for 20 mins. Subjectively, I notice faster device responses to alexa commands, and faster group responses as well. This process I do every (4-5) days. This is a legacy process - there was a time when the community felt it was necessary, I got in the habit, and I've kept doing it. I have been ...told it was unnecessary - in no uncertain terms. I don't really care. After all, it is my hub. Regardless of your opinion on my personal process.
Having detailed all of that for you - now the actually issue.
In the last 2-3 weeks I haven't done this process as usual. During these last 3 weeks I've experienced 3 'Zigbee offline' situations. The first two, I came home and most devices were turned on. The gf surmised 'we must have had a power failure'. I shrugged, used my Android dashboard and set everything back to normal and went my merry way.
Tonite. I caught the failure. I have an Notification to all my alexas if my zigbee goes offline.
First, I heard the notification that it was 5pm. Then I saw lights going on all over - I had the hubitat Android Dashboard open fullscreen so it was clear what was happening.
Normally, having sat through this process 100 times, The lights all go on, then 3-5 seconds later all the lights go off.
Today, the lights went on, but I could see on the dashboard all the expected lights/outlets didn't complete. Then about 10 seconds later all my Echo's around the house started saying 'Zigbee network offline'. The dashboard, which gets zigbee status from Hub Info, said zigbee was online but inside the Hubitat it clearly was not. (obviously a secondary, unrelated issue).
Indeed - no device was communicating, the Hub interface showed the HE zigbee network offline. But the logs don't show WHY it went offline.
I rebooted the hub. No command issue to the hub to 'enable' the zigbee would work. There was NO entries in the logs and basically the hub went stupid.
The reboot fixed it.
I run the XCTU device and the zigbee network was NOT down. I could open the device and it showed the C device was gone, but all the other devices were still communicating. It was only the Hub that went south.
I believe based on what I saw that there is some problem where load, or extended load (it takes nearly 30 seconds to turn off all my devices and turn them all back on - I use groups, and ALL groups have delays built in) I think the timers that HE uses for waiting for execution of a long command cause the HE to decide the Zigbee went down when it didn't and the hub shuts it off.
I recognize this could be any number of problems, but given that there are numerous similar posts I hope that continued reporting may trigger someone with the source to take a look - maybe there really is something my post may help someone find a weakness overlooked in the past.