Hub locks up regularly

After a few days operating, the hub becomes unresponsive to the web UI, sluggish in rules, and eventually stops pushing data to my InfluxDB instance.

systemStart System startup with build: 2.1.9.117 2.1.9.117 2020-04-21 06:18:54.117 AM AKDT
systemStart System startup with build: 2.1.9.117 2.1.9.117 2020-04-13 06:18:54.469 AM AKDT
systemStart System startup with build: 2.1.9.117 2.1.9.117 2020-04-05 02:18:53.545 PM AKDT
systemStart System startup with build: 2.1.9.117 2.1.9.117 2020-03-29 10:23:47.044 PM AKDT
systemStart System startup with build: 2.1.9.117 2.1.9.117 2020-03-27 05:51:40.358 PM AKDT
systemStart System startup with build: 2.1.9.117 2.1.9.117 2020-03-22 08:50:53.827 AM AKDT

I'm running Rule Machine, Notifications, Nest Integration, Mode Manager, Lock Code Manager, Life360 Connector, InfluxDB/Grafana, Simple Lighting, Safety Monitor, Homebridge, Groups and Scenes, Google Home, and Amazon Echo Skill.

Is there any way to tell what app(s) are causing this gradual resource starvation? This doesn't appear to be something to be able to enable in verbose logging. It's quite frustrating.

Thank you.

If you are unable to determine the issue, you could always add the below as a daily routine. It is a good band aide, not a solution.

The one thing to try is to disable (don't remove!) each app for a day or two. If you put your apps in list view instead of grid view, you can :x: out each app to disable it. These options are in the upper corner of the apps page.

If you suspect a device, you can disable those in a similar fashion.

I would start with Google and Echo apps, they seem to get more complaints than the others on your list for causing slowdowns.

Thanks neonturbo. One day isn't going to be enough -- I'm typically looking at 7-10 days between restarts. I'm familiar with the procedure of walking a binary tree as a process of elimination.

What I'm hoping for is statistical analysis rather than empirical testing.

How would devices themselves cause resource starvation / lockups on the hub?

Thanks dj, I'm hoping to avoid this bandaid -- appreciate the link though.

2 Likes
1 Like

@bcopeland:
Bryan: Do you really believe that this latest release has ANYTHING to do with solving the "resource leak" issue?
Is that based on any inside information?

Upgrade or don’t ... No sweat off my back..

1 Like

I did not mean to sound accusatory....

What I simply expressed was my frustration at an underlying issue that has been in place now for over a year.

Please do not misunderstand me. I very much want Hubitat to succeed, and I think that they have come a very long way in a short amount of time. However, the technical advantage that they currently enjoy can't last forever - as competitors are (I'm sure) trying to "catch up".

Yes... within a few hours of the new release being made available, I upgraded.

There will always be a certain group of individuals who are really never satisfied with a particular technical solution. Perhaps it just doesn't match with their initial expectations. Perhaps others (especially in this predominately DIY group), feel they can "do better somewhere else". I don't view them as a glaring weakness. It's when a significant number of individuals feel that other commercially available products are "better" (in some significant way) then Hubitat should be concerned. I don't think it's at that point yet.

After fairly exhaustively disabling apps, I eventually caved and installed the rebooter. Even simple resource usage (RAM/CPU etc.) for installed apps and device drivers would make a big difference in narrowing down the diagnostic scope. :frowning:

Did you try the same for devices using user drivers? Both my HEs are rock stable now; they only get rebooted when there's a platform update. But that wasn't always the case. There were three things that caused slowdowns that I observed:

  1. A badly-written driver (I wrote it).
  2. Leaving stranded zwave devices (my fault for force removing them without exclusion).
  3. Someone else’s port of a ST app.

Of these three, 1 & 2 were of far greater consequence than 3.

I have only three custom drivers -- Aeon HEM Gen5, and Logitech Harmony Hub (two drivers). There's not an easy way to temporarily disable drivers, so I didn't try that.

By 'stranded', would you also include devices that are truly offline?

There is an easy way to temporarily disable devices that use those drivers (which is effectively temporarily disabling the drivers) ......

Yes (so stranded and ghost devices).

1 Like

The number of stranded and ghost devices probably matters. I had ~5-6 stranded devices and maybe 3-4 ghost devices. The ghost devices are supposed to be automatically cleaned out during overnight maintenance, but I never saw that happen.