Bizarre Hubitat C4 crash seems to cause other devices to die

Here's a bizarre problem I've been fighting with over the past three weeks or so.

One of my Hubitat C4's keeps locking up (unresponsive to ping, it has a DHCP reservation). I already carried out a soft-reset and restore, it seems to last another 2 or 3 days then does it again. Nothing spurious in the logs, temperatures appear normal, memory use normal. Power LED still on and network LEDs still showing signs of activity. This is a ZigBee only hub, ZigBee is "dead" at this point because I have some sockets that start flashing a panic orange LED when they lose ZigBee ....

I need to get round to migrating this C4 over to a C5 or C7 .... I acknowledge that but have been putting it off for various reasons (most mentioned elsewhere in the forum). The C4 crashing isn't the part I'm flummoxed by.

So here's the bizarre part .... after the C4 goes offline (I use Nagios monitoring, it's pretty fast, usually pings me within 30 seconds or so of a host or service problem) various other unrelated devices "die".

Without fail, this happens every time within two or three minutes of the C4 going down.

6 x Foscam cameras - these are all hard-wired, on various different UPS, spread over 3 different network switches. It's not a network problem. These all drop simultaneously and I notice it immediately as they're on my desktop PC in a live view matrix. All 6 disappear at the same time, then shortly after the Nagios alerts come in. When looking physically at the cameras the power LEDs are "offf". Some are PTZ doing patrols and they stop moving. There is no life. Simply pulling the power barrel out and reinserting forcing them to power cycle brings them back online.

Hubitat C7 - has about a dozen Z-Wave devices on it, no ZigBee. Negligible memory and temperatures. Nagios alerts me it's offline and it is. Non responsive to ping. Static IP, still showing power LED and still showing network activity. On an entirely different network switch and UPS to the C4 that goes down first. Have to power cycle to bring it back.

HD HomeRun - a DVB-T streaming device. Nagios alerts that it's down. Again on a different network switch to the other equipment. Power cycle brings it back.

As I said, this is repeatable and has been doing it for a month or so.

I've no idea how this can be happening, for example as far as I'm aware the Foscam cameras don't have a "shutdown" command that could be used to turn them off.

I also have another C4 (about 70 ZigBee devices) and that one remains OK throughout.

So far it's just the devices mentioned above that go down.

Appreciate any thoughts on why it might be happening!

My best guess is the C4 crashing must be sending some spurious "cr4p" over the network that's causing certain other devices to crash. But only some. I have over 150 IP connected devices mixed over WiFi and Ethernet.

As an experiment, can you boot the hub and disconnect it from the network? It looks like the hub got all Zigbee devices, and that can function without network. You can optionally use /hub/advanced/disableCloudController endpoint to stop it from trying to ping the cloud every now and then. That is reversible with /hub/advanced/enableCloudController, by the way.

My best guess is that one of the devices on the network broadcasts a malformed packet that causes some IOT devices to crash, and hub going down is an early symptom.

My C4 is rock solid - hasn't gone down at all to my knowledge, but it has a lighter workload than yours

Not really, all my hubs are just used for radio gateways as I do all the logic in Node Red using Maker API. So if there's no network there's no automation.

It could be something else, although everything else appears to be working fine. It just seems to me that the pattern is the C4 goes first, then the other devices (cameras, Hubitat C7, HD HomeRun) several minutes later. Always that order and always only those devices. The other Hubitat C4 is never impacted.

Since there were some other issues (around the soft-reset) with that particular C4 I suspected it's the cause rather than another victim.

I guess I really do need to migrate away from the C4's, if I never see the problem again after that then it's a reasonable conclusion it was the cause.

Maybe something in the interaction between HE and Node-RED is causing the issue?

I mean maybe Node-RED starts triggering things when the hub gets wonky.

1 Like

My best guess would also be that the C4 is sending spurious packets when it fails. Low memory and limited cpu IoT type devices often aren't very resilient to odd network traffic. I'm a little surprised the C7 dies, as far as I can tell, Hubitat is Linux based and has decent resources. It may not be worth the effort, but would it be possible to isolate the C4 behind a firewall and only allow expected traffic through? You also could run a packet capture, but who knows how long it would be until the next failure.

Yeah definitely only so much troubleshooting that's worthwhile to do at this stage. I did used to have some Zywall inline firewall / IDS devices that were perfect for this sort of troubleshooting ..... sold the last one a while back as I've been slowly de-hoarding and de-cluttering!

As mentioned in other threads I've really been holding off on migrating away from my C4's for various reasons as living with the occasional pain is less hassle than migrating (for me, appreciate that other people have different views on that).

But if this continues then I'll probably have to bite the bullet and migrate .... fortunately the recent issues have largely gone unnoticed by the rest of the household so I've been getting away with it to some extent!

Oofff you are lucky - been having some trouble with some always powered ZW+ ilumin bulbs in our bedroom sconces. The bulbs have become somewhat erratic lately - fine for a bit after a "refresh" then unresponsive - usually at night right before bed or in the morning. WAF is definitely at a low point there.. will probably be swapping them out for zigbee bulbs but restore state is also important..

Well, the one good thing about migrating a Zigbee-only hub to a newer hub model, is that the Zigbee devices will pop right back into their current devices once they are re-paired to the new hub (after the new hub has had the old hub's backup file restored to it.) This means all automations and other integrations that are using these devices will be unaffected by the hub change. I would make sure the new hub uses the same IP address as the old hub, to allow existing LAN devices to still connect to the new hub. (Obviously, the old hub will need a new IP address to avoid an IP address conflict, if both old and new hubs are to be online at the same time during the transition.)

I realize it can be a pain to have to reset and re-pair every Zigbee device. I assume this is why you're looking for a C-5 hub, so you can reuse the Nortek USB stick? That is definitely the quickest method. Good luck with whatever you decide!

2 Likes

Yeah there's a few other threads around where I've explained why doing the manual migration isn't very appealing .... as I said it might be OK for most but it really is a pain point for me!

1 Like

I completely understand. In fact, I have my house outfitted with all Lutron Caseta switches, dimmers, fan controllers, and pico remotes - all paired with the Lutron SmartBridge Pro. This allows me to use these devices in whatever home automation system I want now, and in the future, without ever having to worry about re-pairing them. I use a Philips Hue bridge for all of my Zigbee bulbs as well, for the exact same reason. These two systems offer excellent flexibility and integration options. I just wish my Zigbee sensors afforded me the same level of flexibility... :thinking:

1 Like