Hub hangs / locks up

Platform version 2.4.1.170
Hardware version C-8
Diagnostics tool version 1.1.131
Network: wired ethernet, known good

I have a C-8 hub that I started from scratch last July. At some point, after adding apps and drivers, it became unstable, randomly locking up (usually no longer responding to ICMP ping, but not always). I limped along with a scheduled reboot (up to 3x/day!) until I had a chance to selectively disable apps and devices.

I've now disabled /all apps and devices/, and it's still locking up. Is this a warranty issue? What further steps can I take to troubleshoot, with everything off? Do I need to try safe mode?


Have you tried a soft reset? Also, any devices on your network that are using jumbo frames?

Unlikely, but if you purchased an extended warranty, you could certainly submit a ticket at Warranty โ€“ Hubitat Support.

Thanks for responding, @aaiyar. I have done a soft reset, but not since I disabled all devices. Do you think it's worth trying again?

Re jumbo frames, I do not have Layer 3 MTUs configured for anything greater than 1500 on that particular network segment. At Layer 2, some devices could be using jumbo frames, but I'm not explicitly demanding it. Is there evidence that receiving jumbo frames cause Hubitat's networking stack to /crash/? I can test this by putting it behind a filtered bridge, but this seems bizarre.

Yes. Search the forum.

OK, continuing the thread. I disabled jumbo frames (yes, they were enabled). Somewhat improved, but still locking up. I ran a number of cycles of disabling all apps and devices / re-enabling devices using a binary tree, and am still getting what appears to be nondeterministic behavior.

I managed to catch in the act, a loss of ICMP to the wired port, and the LED flashing red/green, which apparently means network issue. I set a static IP address on the wired (wireless is also enabled).

But today both went down again, wired and wireless, simultaneously, with solid green LED.

Does anyone have /any/ other ideas? This is getting really, really old. It's totally unusable as an automation platform.

I have another Hubitat installation that has zero stability issues.

You shouldn't have both wired and wireless connections active - you need pick one or the other.

Unfortunately, the wireless option cannot serve as a seamless backup to wired, so if wired is OK, disable the wireless connection.

3 Likes

Oddly there is no disconnect option. I've reset network settings and will try from scratch (static Ethernet only).

Do all devices have jumbo frames turned off or just the switch?

What do your logs say at the time of lockup?

1 Like

No devices are configured for jumbo frames; the local switch is configured not to pass jumbo frames.

Previously, there were no useful logs at the time of the lockup. I can check again if it recurs. I wasn't checking logs daily which was the most common frequency of hang.

OK, network dropped again โ€” this time with no wireless enabled. The LED did /not/ switch into a blinking state. I unplugged the cable for a few seconds, and plugged it back in, and the network came back up.

No log entries in any logging section.

This is a known-good switch. I can try disabling auto negotiation or try a different switch, but I'm dead in the water without being able to see why the IP stack drops. I have a dual stack network, I can do packet dumps, but I don't have kernel access on the hubitat. Even as a network engineer, my hands are somewhat tied.

I would run wireshark and mirror the port the hub is plugged in to and when it locks dump the logs. The fact that you can unplug and plug back in the network card and re establishes communication shows it's likely not the hub itself. (If the network on the hub was crashed that wouldn't work)

1 Like

Is static set on the hub itself or as a DHCP reservation?
If you have static set on the hub I would try switching it to DHCP as a test.

You could also switch to wireless only as a test, unplug ethernet.

Just trying to eliminate possible variables.

No logs at all while it was offline? Or just nothing useful?

I'd had (since last July) a static DHCP lease, 12h+ duration. I switched it to static.

Switching to wireless is probably a regression, because it will definitely be less reliable than Ethernet. I have wired wherever I can โ€” a server cluster with 10G clustered storage backplane, and satellite switches where required with VLAN isolation. Wired is quite reliable and my work machine is on the same switch; if there were switching, DHCP, or basic gateway reachability issues I'd know.

There were no new log entries while it was offline.

So far, having disabled wireless (reset network settings in fact), it's stayed up! Fingers and toes crossed.

No drops since disabling wireless. No ARP issues on the network switch. If it drops I can definitely mirror, but when it's down, it's down even with static ARP entries (if I recall correctly from earlier testing.)

I never said to permanently switch to Wifi, its a test to rule things out as being the problem.

Ok well test may not be needed then. You said above it went offline again even with wireless disconnected so I thought you were still having issues.

1 Like

Re wireless, I should have been more clear โ€” if I recall correctly, it had gone offline with wireless disconnected. It has not gone offline since I completely reset network, and did not re-add an SSID and credentials. (There's no Disconnect button in my version, but at some point wireless had disconnected.)

1 Like

Today (4 days running) ICMP went offline. No red and green flashing light, so I bounced the Ethernet port after a couple of offline minutes. ICMP returned, but a bit sporadically, with high jitter โ€” between 100 and 2000 ms. (No widespread loss or outages on the actual Ethernet LAN segment, however.) After another couple minutes, ICMP response times returned to sub 10 ms. I checked Hub events โ€” no events. No unusually high apps in App stats; no unusually high device busy state in Device stats. Past logs show only that apps and drivers that hit public APIs or network devices start timing out (temporary failure in name resolution, naturally). Once network was back up, logged failures end.

So something is causing Ethernet to go offline or be flaky, and it's not clear why. I haven't been paying attention to the switch-side port's link status, but maybe I should. What's even odder is the high jitter. ICMP is super low level, and very little should be happening at the CPU/network stack level that should be causing load-based ICMP loss, at least not without some kind of load-based evidence.

This is still mysterious. I can't sit by and bounce the port every time it goes offline. If necessary, I can automate it, but that feels like a really bad hack.

Is there any chance the hub is periodically having to deal with high levels of broadcast traffic? Just a hypothesisโ€ฆ :thinking:

Some devices donโ€™t handle broadcast traffic very well, and can get overwhelmed having to process that traffic, even if not applicable to them.

Again, just a hypothesis. Hope you get this sorted out.

1 Like

It wouldn't be that difficult for me to capture all traffic to and from it.

Here's an incident I happened to catch: normal latency, rising / loss, then recovery. (Nothing happening to the network itself.)

Something does not look quite right with the ping results you posted above. Are you running the PING command on a computer connected via Ethernet to the same network as the Hubitat hub? The reason I ask is that I always get much better ping performance than what is shown above. I always get <1ms ping times. You seem to average around 20ms. Something seems not quite right... :thinking:

1 Like