Hubitat Hubs Unreliable - Diagnosis Advice?

Hopefully I'm posting this in the right place...

Background:
I have a pair of Hubitat Elevation Hubs (Both are Hardware version Rev C-7, running Platform Version 2.3.4.123). The motivation for running the two hubs, which share several devices over Hub Mesh, was to segregate several demanding apps & drivers (we'll get back to that), as well as providing a more reliable connection to a couple of Zigbee devices.

The Problem:
Both hubs have been unreliable, going offline at seeming random intervals, typically at the same time (so it seems the primary issue is either caused-by or propagating over the Hub Mesh). When they lock up, the hub indicator leds remain green, but the network connections go down and the hubs are no longer ping-able. They can only be restored by manually disconnecting and reconnecting the power.

Having gotten tired of running around my house, yanking out and re-inserting fiddly micro-usb cables, I took advantage of the fact that both hubs are connected to UniFi PoE ethernet switches and bought a couple of PoE to USB splitters to power the hubs. This allows me to control the power for the hubs remotely, and I was able to create a script that pings both hubs every 5 minutes and automatically bounces power if they're offline. The scripts logs reset events. Here's the log from the last few days so you can see how often this is happening (the two hubs are named "Main" and "Office"):

( Sat Dec 24 00:09:22 CST 2022 ) Main Down, Bouncing Power
( Sat Dec 24 00:09:24 CST 2022 ) Office Down, Bouncing Power
( Sat Dec 24 20:33:15 CST 2022 ) Main Down, Bouncing Power
( Sat Dec 24 20:33:16 CST 2022 ) Office Down, Bouncing Power
( Sun Dec 25 06:27:48 CST 2022 ) Main Down, Bouncing Power
( Sun Dec 25 22:35:16 CST 2022 ) Main Down, Bouncing Power
( Sun Dec 25 22:35:19 CST 2022 ) Office Down, Bouncing Power

This is a reasonable workaround for now, but I'd really like to get to the bottom of the reliability issue. The problem is that the hub logs are unhelpful. There are no telltale log entries when the hubs go offline - they just silently stop working, as far as I can tell. I'm wondering if there's something else I can do to track down the source of the problem....

I know this issue is probably being caused by a 3rd party app or driver - a UniFi presence driver and Ecobee Thermostat Suite Manager apps that I run on the secondary hub are prime suspects, since they are relatively complex and create significant hub load, but both integrations (to Unifi for presence detection and to my Ecobee Thermostats) are pretty core to my home automation experience so I really want to debug the issue (and inform the driver/app authors) rather then just giving up on them. Also, I don't know that these are the true culprit because of the lack of detailed systems logging prior to the failures. Does anyone have any advice on how to get to the bottom of this?

Editorializing here, and I know this may be a controversial statement: I don't expect Hubitat to warrant their devices against 3rd party plugins, apps, or drivers. That said, I"m not inclined to let Hubitat completely off the hook here: a misbehaving app or driver should NOT be able to take the entire hub down. This seems like a fundamental isolation issue in the base platform or OS, with a lack of defensive controls around 3rd party logic and resource use.

In any event, thanks for wading through a long post. If anyone out there happens to also run their hubs off of UniFi PoE switches and is interested in my little auto-reset script, let me know and I'll post it.

1 Like

They way you are resetting your hubs when they lock up is adding to the instability you are experiencing.

Btw, are jumbo frames enabled on your network?

2 Likes

This is a good way to start troubleshooting. If it was one hub but not both, it might have been an issue with that particular hub, but if both exhibit the same symptoms, chances that there is something wrong with both hubs are extremely low. If you'd like, you can send me a private message and we could look at your hubs' engineering logs to see if the problem is internal to the hubs.

A post was merged into an existing topic: Hue bridge logging / troubleshooting

Do you have a lot of Z-wave devices?

I would actually move everything over a single hub. I would use the Ecobee integration created by Hubitat and set it to 5 minutes intervals. Drop the Unify present detection first and see if that is the cause; then increase the time between checks (intervals) if that is an option.

You might be a little too aggressive in pinging your devices from the hub.

Another point:
Since your hubs have been going up and down, I would strongly consider doing a soft reset - there is a significant chance that your database has been corrupted because it may have gone down at a time when it was reorganizing.

1 Like

Yes, I have jumbo frames enabled. How else should I be resetting the hubs? Both web interfaces (normal and diagnostic) are unresponsive/unreachable when this happens. The network interface seems to be totally down (hence the lack of a ping response).

@aaiyar I'll try moving the hubs to a VLAN without jumbo frames and see if that solves the issue. Thanks for the tip!

1 Like

Disable jumbo frames for the (V)LAN the hubs are on. This is a known cause for random hub network interface crashes.

I suggesting searching the Hubitat community for "jumbo frames" - there's a very large number of reports describing this issue.

2 Likes

Thanks jtmpush. Already tried that previously. I suppose I could do it again...

Not bad advice, but the built-in Hubitat Ecobee integration lacks some features that I consider essential for managing the HVAC systems at my house efficiently. I have already made some tweaks to try to reduce resource usage by both Ecobee and Unifi devices. The odd thing is that both of these are running on the same hub, but both hubs tend to fail in sync. This seems to be a network interface issue that propagates over the Hub Mesh...

2 Likes

It is. Put your hubs on a network segment that doesn't have jumbo frames enabled.

4 Likes

Hello,

Have you been able to solve this issue? I have a C5 and C7 hubs connected via hub mesh and I have the same issue that both disconnect at the same time.

Are you using jumbo frames on your network? If not it likely isn't the same. You may want to start a new thread and describe your issue. If you are using jumbo frames you should fallow the auggest remedies above as they are knowb to cause issues.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.