Hub Load Increases Progressively Over Extended Uptime (62+ Days)

I'm running a C8 Pro with firmware 2.4.3.172, and I've observed a persistent issue across multiple firmware versions over the past year.

The Problem

Hub load increases progressively over time, particularly after extended uptime periods. Typically, the pattern is:

  • Days 1-15: CPU usage remains around 5% or lower

  • Days 15-30: CPU usage begins increasing gradually

  • Days 30+: CPU usage continues climbing

Previously, I would update the hub before reaching critical levels (typically around 15% CPU usage at ~30 days). However, this time I decided to monitor the progression without rebooting to try to identify the root cause.

Current status: 62+ days of uptime with "Hub load is elevated" warning as of today.

Current Metrics

Apps and Devices loads are low:

  • Apps: 0.8%

  • Devices: 1.4%

CPU usage is approaching 40% according to the Hub Information Driver, which is significantly elevated despite low application-level activity.

Free memory has declined to approximately 800MB (compared to ~1.5GB at 10 days of uptime), suggesting a potential memory leak.

Analysis

Even accounting for the memory reduction, 800MB of free memory should not cause the observed CPU spike. The JVM would not typically trigger excessive stop-the-world garbage collection at this memory level unless explicitly configured to do so.

Since Apps, Devices, and GC are not showing abnormal activity in the logs, the high CPU usage appears to originate from the hub's core code. However, without thread dumps and heap dumps, it's difficult to pinpoint the exact cause.

System Configuration

For context, my setup includes:

  • Z-Wave antenna: disabled

  • Matter: disabled

  • Zigbee devices: ~100 (85+ are repeaters)

  • Other devices: LAN, Virtual, Parent devices

  • Apps: 6 installed and enabled (no Rule Machine or built-in dashboards)

  • Network: Ethernet connection

  • Logs: No warnings about long-running apps, problematic devices, or excessive debug logging

  • Overall stability: Excellent (apart from the CPU load issue)

Request

To help diagnose this issue, it would be valuable to:

  1. Enable heap dump collection for memory leak analysis

  2. Provide thread dump access for CPU usage investigation

  3. Identify any known memory leaks or CPU-intensive background processes in 2.4.3.172

While rebooting resolves the symptom, it doesn't address the underlying cause. I'd prefer to understand what's happening rather than simply reset the system periodically.

I'm thinking that you should update to the latest release and observe your results. There have been a lot of changes since 2.4.3.172.
Memory leaks were extensively chased and resolved.

As I said, I've been noticing this problem for over a year now, across multiple firmware versions. During this period I have installed some apps, uninstalled others, added several devices, removed and replaced several others, changed/updated apps and drivers, but mainly, removed everything I didn't actually need, and got rid of problematic software (at least the part I can control) and hardware, always with the focus on getting a very stable system.

A lot of changes, over a long period of time, and still, the one thing that didn't change was the CPU load issue, it was there all this time, and most likely before that too.

If you search for "memory" and "leak" on the 2.4.4,X release notes, you'll find nothing.
About CPU load there is one mention, but it is just about changing how it is reported, nothing related to high CPU load issues.

BTW: With the CPU load at 40%, it is normally VERY easy to spot the problem (or problems) with a few thread dumps.

The memory issues were all in the testing phase so you won't be able to search that.
The CPU reporting was also revamped to report, what I assume was, more accuracy.
I would do a full backup and try the latest release and monitor.
You might find things have changed for the better, or not, but at least that won't be an unknown.

2 Likes

The hub has rebooted by itself yesterday after ~63 days of uptime.
I'll will now update to the latest version and leave it running until it crashes again (or hopefully not), I'll let you know how it goes.

PS: There are no warnings or error messages on the logs around the time it rebooted, I'm not sure if it is a feature that triggered the reboot or if it really crashed.

Yeah, given memory leaks (not saying they are in the hub software or community apps/drivers), on my C8, anything north of 30 days of uptime was surprising - I have a reboot rule, setup for when free memory got below <180K

With my C8P, I've made it to 40 days, but I rebooted due to a version upgrade. - Given the additional memory in the C8P, I would guess 60 days is about an outer limit, but I really do think that's driven by your specific mix of drivers & apps.

I also agree, that limited diagnostic tools that we have, that you really can't see any memory leaks happening at the OS level, or outside of the JVM. - And as system memory and resources are exhausted, then I can believe CPU usage would climb (swapping, GC, etc.) -

I totally agree with your diagnostic tool request, but I don't likely see that happening given the current hub security model - Not to say you can't hack your way in, given enough effort - but even if you find some offending process (ZwaveJS, mDNS (Avahi or Bonjour), Jetty Webserver, DropBear, H2 DB, etc.) leaking resources, I'm not really sure what you could do about it. The hub is running on a older version of Linux 4.9, so that's just a few revs behind current releases.

Bottom line, I think expecting anything more that 90 days of uptime on a C8P without a reboot is wishful thinking - I'm curious if others have extended (months/years) uptimes - And given the frequency of updates/releases, you likely going to get some new release in that time frame, that will likely trigger you to reboot anyways. - And to be fair, IMHO some of this is driven by your hub's driver/app mix - So I'm guess "less stuff" will get you more days of uptime without CPU issues.

That all said, I'm curious to see if your CPU rise over the long term is repeatable, given the latest release, so please update this thread 30-60 days out..