Possible slow down problem with 2.3.0.124?

dJOS · January 23, 2022, 10:30pm

UPDATE: I let my hub get into the 250k range and it was getting slower and slower with some automation failing. So I changed the reboot logic to 300k as that seems to be the sweet spot for my primary hub. It takes about 5-6 days to get to this point so I'm happy with this reboot frequency.

EDIT: Btw, I was looking through my old notes and I found there is a DB cleanup option that doesn't need a hub soft reset to work:

http://your.hubs.ip.here/hub/cleanupDatabase

You'll eventually get a "Done" message and should notice the DB size has reduced.

rob9 · January 25, 2022, 1:14am

After seeing all this discussion, I set up a "memory check" rule on my two Hubs.

The one running rules and most everything except the Z-Wave devices was clearly working itself out of memory over time.

It had been a while since the last reboot when I wrote this new memory checking rule. At that time, it was mostly above 235K but, occasionally dropping under that for short intervals (sometimes dropping under 223K).

Then, as the days went by, it was increasingly dropping down farther and farther. Eventually, yesterday, it got down to about 190K.

My original rule set a warning at 235K and scheduled a reboot 20min later at 223K. But, it cancelled the reboot if the memory rose at all in 20 min. Because that clearly was letting the hub venture increasingly into "memory constrained" territory, I put some additional logic in there to require it to recover more completely before the reboot is cancelled,.

So, my experience would indicate some degree of leakage over many days. I've got 2 C-7s.

(my "zwave & zigbee devices only" hub doesn't seem to share the issue (at least not as dramatically/quickly--it has a few RM apps, but very few.)

mark.cockcroft · January 25, 2022, 1:04pm

@bobbyD or @gopher.ny could you clarify what goes on on nightly maintainance, in regard to state/event limit.
I've turned quite a few down to between 2-5 and might be unrelated seem to see the free memory go down a bit quicker and cpu go up a bit.

bobbyD · January 25, 2022, 1:05pm

Sure, nothing is going on. There is no "nightly maintenance".

mark.cockcroft · January 25, 2022, 1:07pm

so state trim is instant? the setting is still there to input a time is it just backup?

bobbyD · January 25, 2022, 1:25pm

Right, the time setting is for when the backup is generated. There is an hourly clean up. I think @gopher.ny explained this previously, in more details.

mark.cockcroft · January 25, 2022, 1:41pm

i new id read it somewhere , so less than 11 will trigger an instant clean up, and show higher cpu usage..

gopher.ny · January 25, 2022, 1:57pm

I don't recall exactly which version stopped that practice, but it's definitely not in 2.3.0.

thebearmay · January 25, 2022, 2:03pm

Think the "instant" was removed in 2.2.8.x

mark.cockcroft · January 25, 2022, 2:05pm

So wherever its set at clean up will only happen hourly?

gopher.ny · January 25, 2022, 2:06pm

Hourly and right before the backup.

mark.cockcroft · January 25, 2022, 2:07pm

So state and event setting levels don't change anything anymore in regards to clean up/ trimming

gopher.ny · January 25, 2022, 2:14pm

They still are used to tune how much data gets left behind after the cleanup, same as before.
Double reboot is no longer necessary, BTW.

One thing that caught my attention in this thread (and I missed it earlier) is memory drop while doing rule changes. I'll check that out for 2.3.1.

thebearmay · January 25, 2022, 2:42pm

Might look at anything that causes a re-compile. There are a couple of large apps that when I import and update their code I can guarantee there will be a memory drop.

mark.cockcroft · January 25, 2022, 3:41pm

Feel free to poke in my.hub, I've a ticket in on free memory drop

rob9 · February 3, 2022, 7:05am

@gopher.ny FYI:

I added a "memory" check test to my hub so I could track the memory usage on my 2 C-7 hubs (on 2.3.0.124).

While the hub with my z-wave and zigbee devices seems to run for significantly longer periods without running low on storage, the hub that has my automations and apps seems to do fine for a bit over a week--then it starts "dipping" into the warning territory once in a while. As a few more days go by, the dips in free storage seem to get more frequent and deeper until it's time for a reboot.

There does seem to be a correlation with memory < 223000KB and things not working as well. So there does seem to be a slow memory leak. It's not bad enough to require daily restarts, but it seems useful to have my memory check running to trigger a reboot when it drops below a certain point without quickly recovering.

Device hub: restarted on 1/10/2022. It is now at 241944KB. I've not seen it drop below 235000KB since then (yet).

App hub: restarted on 1/23/2022. It is now at 239124KB but in just the past 2-3 days, it's started dropping below my warning (235000KB) and critical (223000KB) thresholds a few times (briefly). I expect it will need to be rebooted within a few days.

I edited a few rules and a couple dashboards tonight but, the past week or so, it's mostly just been hanging out doing hub things on its own.

lpakula · February 26, 2022, 5:41pm

I emailed support this since I'm seeing a lot of stability issues in v2.3.0.124. I just downgraded to 2.3.0.121 to see if they are resolved, but this is what I'm seeing:

Slow Downs and memory loss:
2.3.0.124 loses consistent amount of memory daily. In addition, the hub operation for every task gets consistently slower day by day.

Once initial the system stabilizes memory wise, it's losing about 10-15kB/day
Operations become slower. I was monitoring the execution time of one rule (basically the delay is from when the hub logs seeing the sensor report until it logs turning on the light (2 RM5 rules firing):

Master Bathroom Door Multipurpose contact is closed (SmartThings wireless Zigbee, hub driver)
RM5 rule sets/clears global contact sensor state boolean
RM5 rule triggers Master Bathroom Lights (Lutron Wired Telnet)

On bootup, the time between that logged event of 1 to the logged event of 3 is about 0.5 seconds. Every day that grows by 0.5 seconds. By day 6, it takes 4.0 seconds to execute. During this time of the rule firing, its around 6AM, there is no activity in the house on any sensor since I'm the only person out of bed.

Once the hub reaches around 2 second delay, rule machine rules take 15-30 seconds to save. Editing driver code can take 30 seconds - 3 minutes to save. Sometimes it stops spinning the wheel and looks saved, but it actual did not. Hitting save again succeeds and that is typically quite fast.

In one of those worst cases, the log had an error of the hub unable to secure the DB lock.

Rule Machine 5 Condition Corruption

I built up a rule with 15 conditions.
I cloned the rule, and a condition checking a global variable value had the number deleted, and replaced with "null"
While working on that same rule, I edited a condition from "var<3" and changed it to "var>2". Once that was saved, two other conditions randomly changed. Condition "state=0" went to "state=null", and a condition "temp<28" just deleted.
NOTE: This is on day #4 of the system being up, and I am seeing a consistent 3 second execution delay from the above.

I've went as far as doing a soft reset, and DB restore. There is no change to the above. It does this consistently day after day, reboot after reboot.

bravenel · February 26, 2022, 6:07pm

There is a known reported bug with cloning some Conditional Actions, especially Simple Conditional actions. This is fixed in the next release.

lpakula · February 26, 2022, 6:43pm

Thanks for that update. I did see conditional variables having properties being deleted as well as full conditions just randomly being deleted as well (separate from the clone).

I'd save the rule, go back, and then there would be missing conditions.

lpakula · March 5, 2022, 1:12am

I'm still seeing it on 2.3.0.121. Just reverted to a 3AM daily reboot while waiting for the next update.

I had very similar issue a couple years ago. I ended up scheduling reboots twice per week since after 3 days, slowdowns were very noticeable. An update came out, and it had been fine until recently.