Hub Low on Memory Alert, Have Questions

C7 ver 2.3.5.146
So I don't have a lot of devices or tons of automations but every few weeks I get the Hub Low on Memory Alert. It asks me to reboot.
1st, What should I be looking at in logs, state size?
2nd, Is there a built in auto reboot schedule and/or reboot schedule (time of day) if some memory level exceeded? If not maybe a feature request for this reboot option.
3rd, Will it go away if I do nothing or is it gonna add up and get worse? I've just been manually rebooting.

image

Opinions seem to be divided as to whether this is normal or a memory leak. I've nothing indicative in my logs etc but I've had memory drop at drastically different speeds dependent on firmware version. On .131 I got over a month before it was down to 180000, on the .146 I rebooted yesterday after about 10 days as it was down to 150000.

To save hassle, you can automate the reboot with a rule by using @thebearmay Hub Information Driver.

Trigger - Free Memory reports below ??? and stays that way for 1 hour
Actions - Wait until some convenient time when the hub is usually quiet, reboot

Here's mine:

Q2 - There's nothing built in, but the above rule gets around that very simply

Q3 - It'll get worse and worse. At some critical point the hub won't function properly. When it happened to me, a point was reached where automations weren't working and everything seemed to be grinding to a halt. It finally shut down and I had to power cycle it. When it came back up the database was corrupt so I had to do a restore to get things working again.

4 Likes

I personally choose a very simple method of using Rule Machine to automatically reboot my hub at 5:15AM on Saturday mornings. This timeframe avoids the hub's automated maintenance tasks that typically occur at around 2am to 3am, and my whole family is typically still asleep. I choose Saturday as I am typically home that day to resolve any issues should things not go well (note: this has not been an issue to date.)

Here is a thread discussing how to implement the RM Rule action.

I personally would rather perform a weekly reboot at a predetermined time versus waiting until the hub's resources get too low. While I think it is pretty cool to have the hub check its free memory and then reboot, I would hate for that to occur in the middle of the day, when users are depending on the home automation. YMMV, of course.

4 Likes

The discussion around free memory is a very tricky one. There are several aspects about it that are beyond what any of us that are not internal hubitat folks have access to really. There are a few key things to consider.

First each os and runtime environment handles memory differently. Windows handles it different from Linux/Unix which is different from Java Virtual machines. Java Virtual machines vary from version of java as well. What many are most familiar with is Windows which tends to always have significant memory free until you load apps into it and it generally will release memory pretty quickly when those apps are closed. OS's like linux tend to consume almost all of the memory it is given and then move the memory from disk cache to app memory as needed. So a linux box will typically show much higher memory consumption. That said it is about how the memory is being used that makes the difference. Then Java uses a teir'd approach with multiple memory spaces(heaps) those spaces are defined, and consumed as needed. Then a process called Garbage collection will periodically clean the memory as time/usage dictate. Some times garbage collection won't clean up data you think it should because it doesn't think it needs to even though it isn't be used.

On the Hubitat hub, it is probably running a variation of Linux with a Java Virtual Machine for the core Hubitat stuff. We can't see the consumption of the lower level Linux stuff, but ultimately it is all about consumption based on usage of all of those aspects. I tend to believe that though memory leaks can possibly occur the term is thrown allow allot and isn't completely accurate.

This leads me into the consumption part of this. There is no doubt that what we run on the hub has a significant impact to resource usage. I have two hubs. 1 that drives all of my daily activity in my house and a second that I use for development. The development hub generally has between 600-400MB of memory depending on what I am doing and how stupid i decide to code on a given day. I have seen it get lower on occasion when I have written a particularly bad routine and was troubleshooting it repeatedly. My prod hub that drives my home will reboot and get up in the low 500's for a short time but fairly rapidly drops around 250-180MB. It works fine and will go down very slow. The key is that eventually they stabilize and hardly drop any memory. Those numbers are completely based on consumption and what I have on the hub.

The last thing I would point out is that because this is Java in theory it should recover and free up the memory if the memory consumption is momentary as well. The best example of that is when looking at memory usage during the backup time. I can see every night on my graphs were the backups take place because of a significant drop. Once the memory has stabilized it will generally recover within a few min of the backup to were it was before the backup.

I would suggest if you see a sudden drop in memory watch it and see if it recovers. If it does not look at all you have running on your hub. Some app/devices may be more resource intensive and as such you may need an additional hub to split the resource usage. We tend to forget that this isn't a desktop computer were we have rather significant CPU and Memory resources, but a small arm box with limited ram.

I would also agree with @ogiewon about potentially scheduling the restart if a known good time. Having it restart during the day when people need it is a great way to kill your WAF/SAF.

Here is an example of my memory graph as it is approaching stabilization. The large drops are from backups. I have 2 a night. The one that is local and then one to my external Unraid server.

3 Likes

I have an early morning webCoRE 'Daily Hub Health Check' piston that, using @thebearmay 's excellent Hub Information Device driver, checks the hub for:

  • Hub alerts
  • System updates
  • Free memory

For all checks, the piston logs the results and emails me if something needs attention. If free memory is below my preset threshold of 150,000, it also reboots the hub. I chose 150,000 because I've been told that 120,000 is when many hubs start to show operational degradation. Since I only check against the threshold once a day, the 150,000 to 120,000 difference should provide a sufficient time buffer for further memory loss before the reboot executes.

I should add, the reboot portion of my piston has been successfully tested - but, to date, has not been needed.

Edit: Here is the portion of my piston that pertains to the above Free Memory/Reboot discussion:

1 Like

I have set my low on memory automatic reboot level to a threshold that is supposedly well above the level that might cause any functional problems for the hub. My routine specifies that the reboot will only occur between certain hours so that it will not interfere with anyone’s use. Since the threshold has enough “headroom” to allow the hub to last at least an additional estimated 12 hours (according to past experience), this prevents any reboot during the day.

The reason I went with this approach is that it 1) minimizes the number of reboots (I don’t really know if this is critical but I don’t feel comfortable rebooting too frequently if it is not necessary) and 2) by monitoring the how frequently reboots are actually automatically triggered and the time interval between each reboot, it gives me an idea of how much memory is “leaking” with each addition of another app or device (the Envisalink Integration for example is an extremely “chatty” app) or with each firmware update. You can of course, also check the resource usage in the logs but it is interesting to see how many days you can get out of each firmware update.

Like @johnwill1 and others, I have also seen a wide variation in how long it took to reboot. I used to get over several months at a time with older firmware and less apps and devices to now getting only about 17 days as can be seen in my Hub Status Dashboard:


At any rate, with the frequent updates and responsiveness of the Hubitat team in updating the firmware to address problems, sometimes I find that I am manually rebooting due to firmware updates well before auto reboots are necessary anyways.

4 Likes

Not that it's that relevant here, but I find that, for me, Reboot is much less effective, generally, than a Shutdown/Power Cycle.

1 Like

A full power cycle is good under certain circumstances, but shouldn't be needed most of the time. As I understand it the only advantage that should have over a regular reboot is it allows the Radios to be fully reset. I have done it a handful of times and it is always about restarting the Zwave Radio after certain events.

What benefit are you seeing to a full powercycle?

1 Like

I can't name them off the top of my head, but I do recall that more than a few times, Rebooting didn't do the trick, but the Shutdown/Power Cycle did. Nothing concrete. Less automate-able though.

I have the same problem. My configuration has not really changed in the last 2 years besides firmware and apps updates. It was only 7 days since the last reboot and there seems to be no way to even troubleshoot memory consumption of any app. Any ideas? Or is there any information, if C-8 has more memory, CPU-power etc.?

|Version|C-7 / 2.3.5.146|
|Free Mem|53568 KB|
|DB Size|2 MB|
|Last Restart|2023-06-15 07:56:13|
|Uptime|7d, 2h, 20m, 2s|
lastHubRestartFormatted : 2023-06-15 07:56:13

I really dislike the statement "not really changed in the last 2 years" because that simply isn't true. A ton of stuff has probably changed. At the very least you included that ou have applied firmware and app updates. Do you know how much I have changed in stuff i coded over the last 2 years. A ton of stuff. Ontop o of that if you have stuff loaded from two years ago and it hasn't been touched because the developer left or whatever then that could have a impact. Need to look at what is loaded and taking up those resources.

That said i do think the firmware is using more then it use to.

The first place to look and app and device stats page. Look for the size of the states and see what it says there.

4 Likes

Well, they don't state memory used and the other device/app information does not lead to any conclusion around memory consumption. Also, whether you like it or nor, does not really matter. "Configuration" being devices connected and applications installed - I specifically stated that this "nothing has changed" excluded app and firmware updates... so don't run around and pretend to be Mr. Knowitall....

I don't pretend to know it all, but this is a topic I deal with allot outside of Hubitat. So I try to help others think about things they may not because i have had to spend timing understanding the subtleties of workload management on a few different systems in my time in IT. Performance management and tuning is a large part of what I have had to do over the past many years. As i stated in my first post above i know i don't have all the answers, but perhaps just some insights.

My point about not liking the statement "nothing really changed in the last 2 years" is simply because I have found that though we may think that is the case, it simply isn't true. You even supported that by saying you upgraded the firmware and apps on the hub and those are differences. I can't tell you how many times I have been engaged on performance related issues on the systems I manage only to find out it was something someone did that didn't expect to make a difference.

If you look at the App and Device stats page there is a column for State Sizes. Those would be indicative of how much memory or database size your device or app is using. That is what I was suggesting you look at. That said I agree it is hard to get exact memory usage numbers. Part of that is because beyond those stats much of it is not visible by users.

There is clearly something odd happening in your setup though because to only have a DB size of 2mb yet be down to 53mb doesn't seem right at all. If your hub was busy enough to get down to 53mb i would expect the DB to be a bit larger at least. 2MB means it is hardly being touched.

Thanks, I understand what you mean. There is nothing in the stats data, that would show a significant consumption of time, or storage. The lack of visibility into how resources are being used is simply not satisfactory. and an appliance that does not have any significant number of devices connected suddenly running out of memory with zero explanation and zero way to troubleshoot is not a great solution either.
If the firmware got so much bigger that there are severe limitations now and C8 comes with more memory a computer power, the should simply state that - I would have no problem spending the money on a C8 and retire the C7. I just don't want to do this without knowing that this would solve the problem.

The C8 has the same CPU, Ethernet, RAM, and eMMC storage as the C7. It is running a newer version of the JVM, IIRC. In any event, I have not seen any user reports that says the C8 would solve the type of issue you're experiencing on your C7.

2 Likes

Within a couple of hours, free memory dropped 469MB to 370MB with as far as I can tell, no changes in the "State size"...

can anyone demonstrate a simple rule that sends a notification when the memory reaches a critically low value? (having limited knowledge,
i'm not sure if this is possible with rule machine or needs some extra add-in, like the @thebearmay Hub Information Driver mentioned above)

I do not have access to my hub at this time so cannot send you a screen capture of the RM rule (will try later this evening) but it is a very simple rule. My rule accesses some of the attributes that are exposed by @thebearmay ’s Hub Information Driver. The Hub Information Driver exposes the Free Memory value of the hub. I have the rule set up so that I receive a Push Notification whenever the Free Memory gets below a certain value, as well as automatically triggering a reboot and restart of the hub. In addition, I receive another Push Notification once the hub has rebooted.

Unless someone else chimes in earlier, I can send a screenshot of the rule to you later today when I am on the same network as my hub.

Viewing memory % and a time selected reboot upon an exceeded threshold in my opinion should be built into Hubitat settings somewhere since its Alerting a memory issue and to reboot.

1 Like

To @UkSub , as promised, here is a group of three RM rules I use to manage and get notifications of when my memory starts to approach a critically low value. The three rules consist of: Rule #1 Hub memory low-reboot and notify; Rule #2 Hub Rebooted Notification; and Rule #3 Hub Rebooted - Update Mode States.

Rule #1 tells me when the memory is low and will automatically reboot my hub and notify me when this is done. The rationale for the 12 minute delay is that sometimes your hub may temporarily dip below the threshold level you set and you really only want the hub to auto reboot if it stays below a certain level for a certain period of time. The 2 minute warning allows you a short amount of time to abort the auto reboot if you wish.

Rule #2 sends me another notification to let me know if the hub has actually successfully rebooted based on the uptime that has elapsed since the last reboot, just so I know if I am at a remote location that the hub actually did successfully start up again.

Rule #3 makes sure that my hub is set to the correct Mode when it is restarted based on the time of day. If your Mode setting is triggered by a certain time, your Mode setting may be incorrect until the next time that particular time occurs again depending upon when your hub reboots (say after a power failure depending upon how long it takes for the power to be restored). This is not critical when a “low memory” reboot is automatically performed, but I just included this for your reference to be complete.

As mentioned in my previous response to your question, the device “- Hub Information -“ is the device set up with @thebearmay ’s Hub Information Driver. The various attributes such as “freeMemory” and “uptime” can be referenced via a device you set up using the Hub Information Driver and can be used as Required Expressions and Trigger Events as well as Conditions within your rules.

I have also set up a device using the Hubitat Hub Controller driver to initiate an automatic reboot of my hub upon hitting the low memory threshold. I believe you can also just use the Hub Information Driver device as well as there is a [Reboot] button on devices set up with that driver as well.

Here are screen shots of the three rules:

Rule #1

Rule #2

Rule #3

Note: While my Rule #1 triggers an automatic reboot of the hub as well as notifying me of the low memory condition causing the automatic reboot, to answer your question about a simple rule to notify of the low memory condition, you can edit my rule to get rid of the automatic reboot by eliminating the “reboot() on - Hub Controller - action line.

Hope you find these sample RM rules helpful to give you some ideas of at least one way to approach this.

I am sure that there may be more elegant ways of accomplishing this but this has been working well for me for a number of years now and has been rock solid. I also like to avoid combining too many functions in one rule as it sometimes makes troubleshooting much more difficult when something goes wrong (at least in my experience). By separating the functions into various (more simple) rules, I have found it is easier (at least for me) to track down a problem when and if they occur. Of course, YMMV. Anyway, hope this is helpful to someone.

BTW, I use local variables so that I can easily change the various parameters (such as my low memory threshold, uptime, etc) when I am experimenting with my RM rules to determine the optimum values to use. It is much easier to click on the local variable to change it than to go into the rule and have to edit values in each condition within the rule. This way I can easily see what values work the best for each condition in the rule and can easily change them.

3 Likes