HE Rebooting Due To Low Memory, yet free memory is not low?

Hello All,

I have been trying to track down a long standing issue of my hub becoming slow and then requiring a reboot every 4 days. I have actually confirmed its always 4 days after the last reboot.

So after trying everything to diagnose the issue, I ended up contacting support and had the C8 hub replaced under warranty. I actually ended up paying the difference for a C8 Pro hub as well.

So now I have a C8 Pro which has double the memory, yet the exact same issue is happening and its still happening on the 4th day of uptime. Everything works great up until day 4 and than I right away start noticing lighting automatons becoming slow and the web UI becoming unresponsive. And sure enough the hub reboots itself due to "critically low memory," yet the memory is not actually low.

The other day after the hub rebooted itself on day 4 of uptime due to claiming critically low memory, the actual free memory before it rebooted was still 1.2GB. Also, if this was a memory issue I would have expected that after upgrading from the C8 to C8 Pro hub, this would either have mitigated the issue or at least extended the uptime, which has not been the case.

Despite the hub "claiming critically low memory," I don't think the hub is actually running out of memory. It seems like some kind of race condition in an app/rule, but nothing in the logs shows any issues with an app/rule. So aside from re-creating all apps/rules I am running out of ideas.

So I am at a loss here as to why this is happening. At this point I am about to just set an auto reboot every 3 days and just give up, but I would prefer to not go the bandaid route.

Some of what I have tried:

  • I have gone through the logs, nothing obvious that I can find.
  • Checked devices/app stats, but nothing in particular shows high load.
  • Checked Zigbee logging, and confirmed there was no Zigbee devices spamming
  • State History/Event Size set to 12/30, which is fairly low.
  • Preformed soft reset many times.
  • Powered Down Hub Completely
  • Tried different PSU.
  • Confirmed no ghost Z-Wave devices and all Zigbee devices checking in properly
  • Confirmed no overlap in WiFi/Zigbee channels
  • Had support check engineering logs, but nothing found.
  • Replaced Hub C8 with C8 Pro

I'm assuming the team dug into the backend to try to find the issue before they replaced the first hub? Nothing was found, hence the hardware assumption and replacement? Have they gone through the logs again on the new hardware?

Did you do the cloud backup hub migration to the C8 Pro? If the issue followed you from a migration, it does look like it is something in an app or rule as you said.

Are you using (hub ip)/hub/advanced/freeOSMemoryHistory to see what memory was before it rebooted?

If it is a long standing issue, you can rule out any apps or rules you added since the problem began. I would try to think about what apps or rules were added right around the time you first saw the issue, if you can remember, and try disabling them for four days.

Correct I had them look at the engineering logs, but nothing really was found. So that is why we were thinking a possible hardware issue.

I understand migrating to the new hub would bring over any software issues, I just thought it was a hardware issue, as i was running out of ideas with no clear answer.

I have checked the free memory before, and the history showed that it was around 1.2GB free before the hub rebooted itself on the C8 Pro. When I had the regular C8, it would hover around 300MB free, when the reboot occurred. That is why I do not the hub is running out of memory, since the free memory has never been really been low.

Also its not just the web UI that is slow/unresponsive, the automations do slow down as well so I do not think this would be network related. Also I do not have jump frames or anything special on.

This issue has been going on for the better part of a year at this point. So I have added/changed apps since the original issue started, although the majority of the apps I have had for years. If anything I have actually been removing cloud/LAN apps due to this issue, although this really never extended the time until a reboot was needed or may any difference.

What is odd, is there are no performance issues leading up to day 4 of uptime. Its like everything is working great and then I hit day 4 of uptime and the hub just goes to a crawl.

I also never had any rules/apps that ran anything at a 3-4 day interval.

I am tempted to just start going through and re-creating the room lighting and rules one by one, which is the majority of the apps. But there is a ton, so I am trying to avoid this. Disabling the apps one by one would just take too long since, I would need to wait 4 days each time.

Disable half of the apps, and wait four days...

  • If things improve, the re-enable half of previously disabled apps and wait again...
  • If things do not improve, enable the 'disabled' apps, and then disable the other half of the apps.
  • Repeat the process as necessary until the issue is found.

Or... Set up an automatic reboot every 3 days for now... :wink:

3 Likes

What would be considered a high state size? This is what I have currently.

True I could do it in chunks. I do find it odd that a misbehaving app/rule did not throw any errors/warnings in the logs though.

That does not look unreasonable to me as far as State Size goes....

Here is mine, sorted by State Size

Yeah looks similar.

I also noticed my % busy times for both devices/apps have always been very low, never high enough that I would expect to cause problems. You can see the devices have very low CPU busy time overall.

I will just go through and disable apps in chunks, the hard part is avoiding disruptions with the lighting lol.

1 Like

What do you mean? The hub platform reboots itself? (Never seen that and I have run hubs low on memory, sometimes 90 days uptime). Or do you have an app that does this for you?

How often do you schedule backups? Not saying it’s the cause.

Yeah the hub is rebooting itself, there is no rule doing this.

You know I was considering pausing backups, just to rule that out. I run the daily local backups and a weekly cloud backup.

Depending on how you collect the memory you may not be getting the full picture. Depending on how the hub is running the memory can fluctuate allot. You may need to have a method to check memory more frequently to capture the full picture.

I suspect you have a app or rule that is putting thr hub in a bad condition. Do you use any community drivers?

Can you take a screenshot of all your apps and rules so we can see what kind of stuff you have.

Do you have anything collecting data for later use?

The stats page can be useful, but unfortunately can be very misleading depending on the type of issue you are experiencing.

Have you noticed if these spontaneous reboots always happen around the same time of day?

1 Like

Yeah I track memory usage in Home Assistant, and its very consistent before and right up to the reboot.

Nope, the time of day varies.

And what is home assistant using to get the memory. Do you have it getting the value from the Hub Info driver, or the endpoint that gives cpu, memory and uptime? My point is to get the check interval for memory lower then the typical 5 min. Mine collects every 30 seconds. You could even go lower in this case since we are trying to see if you really have a memory problem.

I would suggest starting by turning off all of the apps labled "User" and see if that makes a difference. Then move on to Rule Machine rules if that doesn't help. Rm is great because it is so flexible, but it also allows us to do things that may not be the best.

1 Like

Actually those average temperature apps. What is that. How many sensors do they have on each of them and what ia the reporting frequency on the temp sensors.

1 Like

I have been using those for over 5 years at this point. It just averages temps from a group of sensors.

So after more testing including disabling all Rules/Basic rules/custom apps I still have the 4 day reboot issue.

I logged the memory on a 30 second interval right up to when the hub reboots itself and the free memory was still at 1.2GB free.

Also still no errors/warnings in the logs and the app/device stats do not show any app/device misbehaving or any excessive activity.

I think I may concede at this point and just setup an auto reboot every 3 days, since there is no issues before the hub reboots itself.

I did notice when the hub does slow down, the point at which the hub reboots itself is fairly quick. Usually by the time I notice automations slowing down, within 1 hour the hub would reboot itself.

There has to be something unique in your environment. My C8 Pro has been up for 7 weeks. I think most folks that have the same thing would indicate something similar as well. Can you disable all of your apps and restart them one at a time after 4 days.. It will take a while, but I am not sure what else to suggest. There is no doubt something is wrong, the question is just what is it.

I already tried disabling the custom apps/rules, and still had the issue. I agree there is something still happening. But with literally nothing in the logs I am just guessing at this point.

I would prefer to fix the issue, but when I have literally no issues with the hub in the 4 days leading up to the reboot, then scheduling a reboot every 3 days would mitigate the issue completely.

What really bothers me though is that the hub is claiming to run out of memory, yet what I have logged is telling me that is not the case. Unless the free memory reporting is just wrong.

Tagging @gopher.ny and @bobbyD from Hubitat Support. Can one of you please take a look and see if you can determine the underlying root cause for this user's issue?

1 Like