Diagnosing memory usage

C7 running 2.3.6.145. All devices are z-wave. I only have 16 devices, and 10 of those are disabled because I only use them for bench testing different devices/configurations for a home under construction.

I've been monitoring memory usage because I've had to reboot due to low memory a few times, usually after a week or so of usage. After reboot, I have...

11-07 08:28:18,661780,0.57

A day later, it gradually reduces to...

11-08 08:33:47,576128,0.03

It continues to gradually go down until this happened last night...

11-10 23:53:28,525472,0.05
11-10 23:58:32,525432,0.05
11-11 00:03:36,358620,0.05

It is currently showing...

11-11 10:07:20,352772,0.03

Here's what the log shows around the time of the big increase...

The same sequence of log entries are shown at midnight for previous days, but those days don't show any corresponding memory usage increase like the one that happened last night.

I don't have any rules, etc. that run at midnight.

How do I diagnose the cause of this big increase in memory usage? Also, is it normal for memory usage to gradually increase over time? In 3.5 days, usage increased quite a bit, before the big increase last night.

Quick question as an aside. Are the devices you disabled, still powered up or are they off?

They are not powered up unless I "un-disable" them for bench testing.

I've been seeing similar on my C8 with 2.3.6.144. It's happened three times now. I get a gradual slow decrease in free memory over time as expected then a large 100,000KB plus drop that doesn't align with any backup, update or activity. I've already done a soft reset and restore in case there was any database corruption. I've detailed it in this post:

If any of these are mains based, that could cause issues in your mesh...Disabling them doesn't turn off the device's routing...

My mesh understanding is that it should heal itself over time. How would their prolonged powered-off and disabled state affect free memory?

Zigbee does a better job than z-wave. Removing a mains based device from a z-wave mesh will make it very unhealthy. Z-wave doesn't heal itself that well. It's less critical with battery devices as they don't route, but removing mains based devices is equivalent to using a baseball bat. If you're gonna remove a device, be it zigbee or z-wave it's best to exclude z-wave and just remove from the zigbee table.

I don't have any issues with taking a device offline. YMMV :person_shrugging:
After a couple of days all the other devices forget it even existed.

@user2164 did you see this post and try a backup/restore as a first step?

1 Like

When I'm bench testing a device, it's sitting within a few feet of the HE, and it's powered up for only a few minutes at a time over maybe an hour or two. That may happen every couple of weeks at this point (most of my bench testing is done for now). The few times I've looked at the z-wave topology, none of the "production devices" (i.e., the six I'm actually using in this house) are attempting to route thru one of the disabled devices.

I just did a backup/restore/reboot and am back at 648,916KB of storage. I will monitor it for a few days to see if the big decrease happens again.

If it doesn't happen again, does that imply that the database was corrupted? If so, it doesn't seem like backup/restore/reboot is a reasonable alternative solution to diagnosing and fixing the real problem.

Yes, some sort of database issue recently has been shown to cause what looks like a memory leak. No one knows what is causing the database to get to this state. The devs are aware of the issue.