Hubitat (C3) hung after losing network

@support_team ,

I have been noticing that anytime my network switch loses power, that my H3 hub freezes up.

Looking into the hub events, I can see that the system CPU is overloaded. Looking into the logs, I can see that the all the LAN integrations are failing to connect -- as expected.

the issue with the high CPU utilization appears to be in the http methods and the new ping method. If the network is down, these services appear to put a heavy load on the system while waiting for a response. Or at least, the hubitat software.

After the network is restored, the hub will not come back online, and the hub seems to be stuck in a hung state. To correct this, I simply power cycle the hub, but it would be cleaner if the hub could auto-recover.

It appears the biggest offenders use web calls without timeouts. Perhaps a forced timeout can occur when the network is offline, by means of interrupting and killing the thread that handles the web call, until such a time as a local IP address is available again?

You experienced this on the C3 hub, right?
Maybe some app/driver level HTTP call throttling can help. I'll have to think about this.

3 Likes

That is correct. I put all my LAN integrations on my older c3 hub, and put only z-wave, Zigbee, devices on my C7 plus the SharpTools and Alexa integration are on the C7 still (I don't want the hub-mesh names on sharptools/Alexa for my devices). Otherwise, everything else is on my C3. Bond, Hue, Kevo, Roku, hubigraph, etc...

I did this because I noticed that any web calls, while waiting for data, seem to affect the % of total, % of busy, and total/avg ms values in the runtime statistics. It also seems to ensure my Z-wave and zigbee performance is not affected by LAN interface performance.

What I did notice is that my custom Hue and Roku drivers are not as badly impacted, because I use fairly aggressive timeouts of between 3 and 10 seconds, whereas all the other 3rd party apps/drivers do not define http timeouts -- and thus, I assume that means they are using the max of 300 by default. (what is the default timeout if not defined?)

2 Likes