Hub process slowdown after several days

This is not scientific but more of an observation of the slowdown. I have noticed that the HE hub (C-5) in my case seems to process everything as a single thread, or single file. As long as everything is working correctly the line moves along nicely but if for some reason there is an issue, whatever it is clogs up everything behind it. An example may be an app trying to communicate to the outside world and it can't so the app sits waiting for a connection. Instead of stepping out of the way (de-prioritize) everything behind in the queue piles up waiting. It could be a matter of this queue filling up and causing everything to slowdown. I can understand support wanting people to disable custom app/devices because they could have been written without the logic to handle unexpected conditions. I am speculating on this but believe the SmartThings platform they detect these conditions and can disable or limit these apps so they don't snowball and bring the system down. Very early on they had platform problems caused by misbehaving apps.

Just my 2 cent opinion based on what I experience.

3 Likes

I have barely any drivers or apps installed. I’m also experiencing slow downs.

Is the slow down to do with rules? For instance do the motion sensors continue to detect as quickly yet the rules run slower, or is it everything ?

Going by the logs, motion sensors do not detect as quickly.

But... I can only go by the logs. I don't know if the motion sensors are actually slowing down (I sort of doubt it), or the hub is just not registering the motion as quickly (more probable).

Does this really work as a reliable indicator of the "slowdown" condition?
Yes, I can do a ping from my RPI to my Hubitat, and graph it over time.
But, what am I looking for?
How do I recognize a slowdown has occured?
Is this a reliable indicator?

No its probably not a reliable indicator. Folks use device page load time as one metric to measure system response time. Ping time may or may not change when the system is acting slow. I would expect to see a slight increase, but cant verify if that actually happens or not.

I've been reading along with this thread because I hate to see slowdowns. It's the primary motivation I had to purchase additional Hubs and join them via HubConnnect. :smiley:

I've seen "HubConnect" in a couple of messages within this thread along with the word "unlikely" - which is certainly my experience. BUT...

The ideas that are bubbling it's way to the top of my suspicions is two things... 1) the slowdowns are more than one problem. I think that the issue with C-5's and Netgear is a completely different problem than "a reboot fixes it' -- because I'm not seeing a C-4 vs C-5 distinction.. but maybe that's just wishful thinking on my part or lack of data points...

for 2) I'm seeing a lot of Cloud/Internet apps being mentioned. I know I spent a ton of time dicing up APIXU to be more 'friendly' with regards to hammering the Hub... I split the cloud connections up, made them async http, reduced the DB impact, etc. I'm more than please to see @mathew carry that across to DarkSky. But I wonder if Async Http and it's benefits haven't made it to all the most popular apps... yet. I wonder if there are loops that just pack too much into one continuous cycle. (Runs every 10 mins but the minute it does run is consuming 100%)

I know that HubConnect, will do Async Http BUT are people unaware of the Hub Impact to using http (oAuth) vs Event Socket?

EventSocket runs on your hubs 24x7 pumping out every event, as it occurs. You cannot turn it on or off because it's on all the time. If you also config MakerAPI-to-Homebridge and HubConnect-using-http (oAuth) then you've got double or triple the traffic for no benefit... or at least no outstanding (obvious) benefit.

If you can use EventSocket, it's your best solution because it is effectively 'free' -- it's running/outputing anyway - and specifically for HubConnect, there's nearly zero load on the sending Hub when using EventSocket.

For those of you using HubConnect and seeing slowdowns, please check your HubConnect configurations to use Event Socket if at all possible.

34%20PM

If you find you need to change, it's just a matter of selecting Event Socket on the server and copying the new key to your remote hub.

I just spent a couple of hours running packet captures of my HubConnect hubs, specifically watching for cloud traffic. I'm seeing some. It's extremely tiny.. a packet every 5 seconds average. It's hitting Amazon's AWS exclusively and I'm mentally labeling that traffic "hub checkin.' Where it's looking to see if there's a new update.

Let me be clear... I don't see the slowdowns.. I think there's a couple of reasons.. 1) I've been using EventSocket since the day Steve @srwhite wrote it. I've never gone back. 2) the cloud apps I use the most that I can edit, I've converted to async or, like APUXI, took a machete to it spread it's load across time.

At the same time, I'd just hate seeing slowdowns and am trying to see what's different for me. Remember, I saw slowdowns back in Dec-Feb of this last year/this year. It drove me to build an interconnected array of three hubs using HubLink/Link2Hub... replaced by HubConnect.

Splitting ZWave across two hubs made the most impact. Moving the Cloud apps to their own hub was a close second. Converting to HubConnect and especially/eventually EventSocket was a not insignificant benefit too. (I have a very tiny Zigbee network, and have never had a problem with it, related to slowdowns. I loath plug-in wall-wart repeaters that are necessary for Zigbee and have had too many occurrences of finding the wall-wart sitting on the floor because my kids wanted the extra socket for charging an iPad.)

I wish the same good fortune to all of you. :smiley:

6 Likes

Packet cap proof:

asa1# sh cap jpcap

1029 packets captured

   1: 20:47:58.022948       802.1Q vlan#1 P0 192.168.7.68.55927 > 18.223.183.64.8883: P 3173872143:3173872174(31) ack 2022206854 win 312 <nop,nop,timestamp 1053925432 3355598586> 
.
.
.
1029: 21:44:57.437492       802.1Q vlan#1 P0 192.168.7.68.55927 > 18.223.183.64.8883: . ack 2022210484 win 312 <nop,nop,timestamp 1054267373 3356460924> 
1029 packets shown

56 minutes, 1029 packets = 3.2 seconds per packet.

That's one packet every 9 seconds per hub (I have 3.) Meaning it's possible to have nearly zero cloud traffic on a reasonably complex system.

1 Like

My little C4 hasn't had any slowdowns that i've noticed (I have motion lighting - but never wait over a second for it to fire). I've never restarted my hub due to slowness, only ever to try and fix something i've probably done wrong.

3 Likes

I have a simple test Rule that I can tell when things slow down.

testsw

The 2 test switches are virtual switches configured with a 1 second turn off time. I then have them on a dashboard.

When either of the switches is pushed the rule triggers at some point. Then the if statement looks to see if the switch is on and either turns the light on or off. If the light doesn't turn on or off then the switch has turned off before the if statement executes.

After a reboot the rule always works flawlessly. I can watch the log and see that everything happens within about 200 msec. But over time that time gets longer to the point by the time the rule executes the switch is already off. I point out that the rule always gets triggered but sometimes it is far enuf past the time the switch is turned on that by the time the if statement executes the switch is off.

This has been very consistent. I need to start tracking how long or how many days it takes for the slowdown to affect it.

I was hopeful as this is my 4th day since a reboot, and I have never made it past 2 days ever since I first set up HE, but this morning one of my zigbee switches was changing faster than the rule was evaluating it, which resulted in an endless cycle of my kitchen lights turning on and off. I was able to pause the rule, and it took longer than usual to access the web ui. I waited a couple minutes and checked other automations, which are usually immediate, but now were taking several seconds to initialize, so I rebooted the hub and all is well again. I have definitely made progress, but obviously there was more than one issue at play here.

Sounds like a good use case for private boolean to keep the rule from being re-run if its already running.

@csteele / somebody:
Can you please make an app/device/something which would measure the "degree of slowdown"? It would be helpful to move to an objective measure, rather than subjective feelings on this issue.

So far no slow down after changing from the Hue kitchen group to an HE kitchen-lights group in my RM4 kitchen lights rule. But it's only been less than 1 day, so I wouldn't expect a slow down yet anyway. To be honest, I'm finding that the HE group sometimes popcorns. I'm willing to put up with it for a few days in the interest of experimentation, but long term I'd rather go back to the Hue group and reboot every night.

But... I had another thought...

I've been looking hard at my logs, and I noticed that the Kitchen Lights rule is running multiple times within a very short time frame -- less than 1 second. It's fast enough that the private boolean doesn't always have the intended effect.

I know it's happening because I have three motion sensors triggering the kitchen lights. When I walk through the area, like in the morning when I'm headed for coffee, I trigger all three of them almost simultaneously. I'm wondering if the poor rule is tying itself in knots? This might explain both the slow down and the problem with one or more of the lights not dimming or turning on correctly.

Since I'm already using private boolean in that rule, I've added a local variable. Like this:

  IF alreadyRunning = True THEN
    exit rule
  END-IF
  Set alreadyRunning = True

  --- do rule stuff here ---

  Set alreadyRunning = False

Don't know if this will help -- I'm open to other suggestions -- but it does seem to stop it from running multiple times very fast. If I go a day or two more with no slow down, I think I'll change back to the Hue group and see what happens.

If all 3 motion sensors see you anyway, just use 1 instead of 3.... Then you wouldn't have to worry about concurrent running issues.

They actually cover different areas. One points toward the entry and triggers lights when I come home. One points towards the family room adjoining and open to the kitchen. I spend a lot of time there and the kitchen lights illuminate it. One is in the kitchen itself, to keep the lights on while I'm in the kitchen.

It just happens that when I walk from one side of the house to the other, I trigger all three of them. And, in the afternoon when the slow down is most visible, I tend to walk around a lot.

If they happen very close to another you will always have redundant/concurrent rule running issues. No way around that IF they happen faster than the execution speed of the RM rule... If they are further apart (a few hundred ms maybe?) You are probably fine.

It was a switch triggering motion lighting on/off. It became disoriented apparently since it has never happened before, although I’ve never had the hub slowdown happen this gradually before either.

I use the built in Zone motion controller to create one aggregated virtual motion device that I use in the Motion Lighting app for my garage lights.

2 Likes

That's an interesting thought that I think I'll take a look at. I also have multiple motion sensors in my bedroom/master bath that I'd like to explore that with.

It has been more reliable for me without hammering the logs with the motion sensors. I started using it because the lights kept turning off when I was working in the garage.