Critical free OS memory level

Hello all,
As a relatively new user I have been trying to educate myself regarding the finer points of Hub management. I understand that some users have experienced some progressive slow downs secondary to free memory leaks over time. I, of course, understand that the solution is to identify and resolve the source of the leaks. However, I see that some users have dealt with these slow downs by periodic reboots. Given that I am not always available to immediately diagnose and fix the source of potential memory leaks due to work obligations, I thought that a potential temporary “bandaid” solution would be to poll my hub status with something like Hub Information Driver and then to use an attribute (such as free memory, CPU% or some other parameter) to trigger a reboot automatically once a certain minimum threshold has been breached.

Although I have been lucky so far to have not experienced significant slowdowns, I have noticed that as more and more devices and applications (particularly 3rd party apps) are loaded, that sometimes lighting in response to motion detection may begin to slightly lag. Whenever this has happened (thankfully quite infrequent), and I did not have time to track down the source, i have successfully temporarily performed a “quick fix” by rebooting. I understand I could perform regularly scheduled reboots but wish to only perform reboots if actually needed, thus my thought to tie reboots with free memory or some other parameter. I could then determine (by the frequency of notifications of automatic reboots) whether any memory leak is present, which would require more extensive diagnostic tests. My hope is that I can do this in the background without the family ever noticing a prolonged period of slowdown, thus improving the WAF.

My question is: Is there a particular parameter (and threshold for this parameter) that can be used to preempt a slowdown, such as free OS memory level or CPU% and what levels would you set to automatically trigger a reboot (250,000; 200,000KB; other)?

Again, I realize that the true solution is to fix any “leaks” but I’m looking for a minimal effort way to keep the family happy until a more proper diagnosis and fix can be performed at a more convenient time. Thanks to all in advance for your experience and advice.

For simplicity I would setup regular reboots and send notifications based on something like memory. The reboots should help to reduce the likelihood of an issue but provide you an opportunity to still respond to circumstances if they come up.

Another thing to consider would be a second hub. Things like lighting need to be snappy because the user experiences it (or wants to) immediately. Other apps and integrations that may affect this can be moved onto a second hub, freeing up resources for the lighting. That said, it may not always be that simple, but I think it can definitely help.

I have my C7 Production Hub monitoring alerts set at:

Max Temperature 140°F (60°C) - my normal avg is 102-104°F (39-40°C)
Max DB Size 80M - my normal (28-32M)
Min Free Mem 250000K - my normal 350000-420000K

High temperature is, for me, usually associated with continuously high CPU, (I may reach 139°F/59°C when I reboot.)

YMMV

Edit: With recent improvements in memory management I've lowered my memory alert to 150000K

3 Likes

I haven’t had to care about free memory since I threw my C4 stick into a C5. The C4 was not sufficient to run Hubitat - period. This may help you. I believe my threshold for alerting and forcing a reboot on my old C4 was if freemem was less than 110000. Here are my stats for the devices I am running now. Both C5’s but main is running the dongle from the C4. I have a C7 in a box that I don’t really need. I assume the downward spikes are reboots or outages?

1 Like

Where are these graphs and/or resource levels found? Is this only a C7 thing? Maybe I havent explored a tab or something

Typically people have either setup InfluxDB and Grafana on an external raspberry pi or something similar, or used HubiGraphs.

1 Like

ah, so does Hubitat broadcast these metrics to out that grafana can scrapre?

The setup I use, which is not the one from the "long read" thread, just has Maker API setup with a list of devices that have their events streamed to a Node app on my rpi. The Node app interprets the events and stores them in InfluxDB.

When memory gets too low the first thing that goes is the zigbee radio. This rule has solved it for me.

Reboots maybe every month or so, if i don't.install an update sooner.

2 Likes

I'm experiencing what I believe to be a memory leak in my C7. Using HubInfo I am monitoring the memory and using HubInfo to trigger a reboot if it gets down to less than 20MB.

I've made sure that no devices are set to debug mode, and have done my best to make sure that I don't have any devices (such as thermostats) set more sensitively than they need to be.

At present my system is rebooting 1-2 times / week.

I'd really like to either figure out what I can do to fix this issue, or understand what I need to do to track and share the logs so that the OS can be patched / fixed.

Any advice / feedback is greatly appreciated.

Also, my system isn't particularly large yet. I've been trying it out before adding a lot more devices, but at present it is being used to maintain the temperature of our greenhouse, which is pretty critical as if it doesn't trigger a heater to go on all of our baby plants destined for our spring garden will suffer cold and unceremonious deaths.

I should add that as of today I am running Platform Version 2.3.4.138, and the hardware is C-7.

After reboot I have about 630MB free memory, but that will eventually tick down to less than 20MB, thus requiring a reboot to avoid system (and plant) freeze (or if the heater's on driving up our electric bill).

You are right in that something is amiss, but no one can tell unless you provide more details about your environment. I would guess something is very busy on the hub, and would start with your device and app stats to see if anything stands out with high cpu time or state sizes.

I don't reboot my hub except for when upgrades are published so you really shouldn't need to reboot it at all.

Thanks for your fast reply. Here are the stats.

Please let me know if there's any other info or logs I can post.

It's ironic that HubInfo and my process for monitoring if the memory is low is taking up so many resources. I installed that app and HubInfo because I was already experiencing memory drain.


Well nothing stands out in that info. You may want to engage hubitat support.

0.211% isn’t exactly a lot of resources….

4 Likes

@thebearmay, understood. I meant by total ms.

The Unix process top uses a lot of resources.

Thanks for responses. I’ll post what I learn.

I don’t see them in your screenshots, but do you have any apps or drivers that do a lot of http or udp traffic? I have seen where that type of app/driver will cause memory leakage and higher cpu if the number of connections go awry.

1 Like

@thebearmay, to my knowledge I don't have any apps or drivers that do a lot of http or UDP traffic. Normally I would use a shell to better trace what's going on, but it seems that Hubitat doesn't allow for command line access.

Can you recommend any additional steps I should take to diagnose what's going on?

My machine is rebooting daily now. Starts off with about 640MB of free memory and steadily ticks down to 20 mb, which is when my rule triggers a reboot.

I'd be glad to post any additional information that would be helpful to you if you let me know what you need and how to post it.

Also, how does one contact Hubitat Support now? Even though I've signed up for Hub Protect, it's unclear to me how to report this issue. I tried emailing support@hubitat.com, but got an auto-response that pointed me to web pages that didn't have a way to reach support for this sort of issue.

Thanks for your help.

Capsmet

Here are my logs from my Hubinfo triggered shutdown events:

app:902023-02-05 03:00:00.571 AMinfoAction: Notify iPhone: rebooting(2023-02-05 03:00:00 Mem low, rebooting)'

app:902023-02-04 03:00:00.883 AMinfoAction: Notify iPhone: rebooting(2023-02-04 03:00:00 Mem low, rebooting)'

app:902023-02-04 03:00:00.883 AMinfoAction: Notify iPhone: rebooting(2023-02-04 03:00:00 Mem low, rebooting)'

app:902023-02-03 03:00:00.564 AMinfoAction: Notify iPhone: rebooting(2023-02-03 03:00:00 Mem low, rebooting)'

app:902023-02-03 03:00:00.564 AMinfoAction: Notify iPhone: rebooting(2023-02-03 03:00:00 Mem low, rebooting)'

app:902023-02-02 03:00:00.507 AMinfoAction: Notify iPhone: rebooting(2023-02-02 03:00:00 Mem low, rebooting)'

app:902023-02-01 03:00:00.908 AMinfoAction: Notify iPhone: rebooting(2023-02-01 03:00:00 Mem low, rebooting)'

app:902023-02-01 03:00:00.908 AMinfoAction: Notify iPhone: rebooting(2023-02-01 03:00:00 Mem low, rebooting)'

app:902023-01-31 03:00:00.916 AMinfoAction: Notify iPhone: rebooting(2023-01-31 03:00:00 Mem low, rebooting)'

app:902023-01-31 03:00:00.916 AMinfoAction: Notify iPhone: rebooting(2023-01-31 03:00:00 Mem low, rebooting)'

app:902023-01-30 03:00:00.551 AMinfoAction: Notify iPhone: rebooting(2023-01-30 03:00:00 Mem low, rebooting)'

app:902023-01-30 03:00:00.551 AMinfoAction: Notify iPhone: rebooting(2023-01-30 03:00:00 Mem low, rebooting)'

app:902023-01-29 03:00:00.436 AMinfoAction: Notify iPhone: rebooting(2023-01-29 03:00:00 Mem low, rebooting)'

app:902023-01-28 03:00:00.508 AMinfoAction: Notify iPhone: rebooting(2023-01-28 03:00:00 Mem low, rebooting)'

app:902023-01-28 03:00:00.508 AMinfoAction: Notify iPhone: rebooting(2023-01-28 03:00:00 Mem low, rebooting)'

app:902023-01-27 03:00:00.771 AMinfoAction: Notify iPhone: rebooting(2023-01-27 03:00:00 Mem low, rebooting)'

app:902023-01-27 03:00:00.771 AMinfoAction: Notify iPhone: rebooting(2023-01-27 03:00:00 Mem low, rebooting)'

app:902023-01-26 03:00:00.361 AMinfoAction: Notify iPhone: rebooting(2023-01-26 03:00:00 Mem low, rebooting)'

0300 is normally the time when hub maintenance and backups run and a small temporary drop in memory is normal, but dropping to 20MB is unusual. @support_team may need a look at the engineering logs on this one

1 Like

Ya i posted another thread. Mine is also dropping nightly and every night some is being lost.

1 Like

A small amount of loss is to be expected even on a hub with nothing going on.