Hub process slowdown after several days

I interface things through Polyglot into my ISY. My ISY controls all of the logic. I use HASS for the UI to have a nice dashboard of my overall system. I use Node-RED for one off things mostly prototyping to then develop a nodeserver for things I intend to use long term. I do use other Node.js services stand alone such as jinshi's Sonos HTTP API which is excellent.

Networking is not my specialty, all I know is that the command I sent the HE ended in lanautonegconfigenable. This would seem to enable auto-negotiation, yes?

It's very interesting that auto negotiate would be disabled by default. That used to be the case years and years ago as auto negotiate between servers/switches was sometimes a problem (looking at cisco) but I haven't encountered that problem for over 10+ years now. Interesting.

I go hubitat <-> Node-RED (event socket or MakerAPI depending on what I'm doing) <-> home assistant. It definitely has some delay doing that, though.

I'm using EventSocket and MakerAPI between systems.

Hubitat <--> ISY

Several things feeding/reading from ISY including Home Assistant.

It's MY belief that that command does the opposite of it's label... it's a disable of autonegotiation. It's like a Monty Python sketch. :slight_smile:

2 Likes

Oh now that would be interesting.

It's my belief that THAT command is a forced "100/full" vs the more correct "auto/auto" out of the box.

The hub is shipped, auto/auto and that command will change it to be 100/full.

It's also my belief that the problem is C-5 specific. I have one, I have it plugged into a Netgear switch and I asked about this patch. I didn't need it.

I have a C-5 hub and a Cisco router. Courtesy of Comcast, trying to get any interesting info out of it is next to impossible. For a variety of reasons, I live with it. I don't want to buy another hub or move to Node Red/Hass. That's where I came from, as well as ST. I'm in the "I want it to just work" camp. I don't mind tweaking now and then, especially when adding new things. But I don't want to babysit it constantly.

I'm trying an experiment.

The automation that gives me the most trouble is the one that runs my kitchen lights. It's always the one that slows down first, even when other things are working as expected.

Most of my lights are Hue bulbs attached to a Hue bridge. My kitchen has 5 can lights. The Hue app sees them individually and as a group called "kitchen." Both the individual bulbs and the kitchen group are visible on HE. I always turn the lights on/off as a group. The kitchen lights automation uses the Hue kitchen group to prevent popcorning when turning the lights off and on. When I start having problems, it usually shows up first as one or more bulbs not turning on or off correctly, or not turning to the correct dimmer level. Following that, eventually the lights will turn on correctly, but slower and slower and slower. Reboot fixes it. For a time.

Now... my experiment... I've created an HE group called "Kitchen Lights" using the 5 individual Hue bulbs. I've changed my kitchen lights automation to use the new Kitchen Lights group instead of the Hue kitchen group. Then I rebooted, to get a fresh start. So far, it's all quite snappy. We'll see how long that lasts. I've turned off my auto reboot rule so I'll get a fair representation.

1 Like

This is not scientific but more of an observation of the slowdown. I have noticed that the HE hub (C-5) in my case seems to process everything as a single thread, or single file. As long as everything is working correctly the line moves along nicely but if for some reason there is an issue, whatever it is clogs up everything behind it. An example may be an app trying to communicate to the outside world and it can't so the app sits waiting for a connection. Instead of stepping out of the way (de-prioritize) everything behind in the queue piles up waiting. It could be a matter of this queue filling up and causing everything to slowdown. I can understand support wanting people to disable custom app/devices because they could have been written without the logic to handle unexpected conditions. I am speculating on this but believe the SmartThings platform they detect these conditions and can disable or limit these apps so they don't snowball and bring the system down. Very early on they had platform problems caused by misbehaving apps.

Just my 2 cent opinion based on what I experience.

3 Likes

I have barely any drivers or apps installed. Iā€™m also experiencing slow downs.

Is the slow down to do with rules? For instance do the motion sensors continue to detect as quickly yet the rules run slower, or is it everything ?

Going by the logs, motion sensors do not detect as quickly.

But... I can only go by the logs. I don't know if the motion sensors are actually slowing down (I sort of doubt it), or the hub is just not registering the motion as quickly (more probable).

Does this really work as a reliable indicator of the "slowdown" condition?
Yes, I can do a ping from my RPI to my Hubitat, and graph it over time.
But, what am I looking for?
How do I recognize a slowdown has occured?
Is this a reliable indicator?

No its probably not a reliable indicator. Folks use device page load time as one metric to measure system response time. Ping time may or may not change when the system is acting slow. I would expect to see a slight increase, but cant verify if that actually happens or not.

I've been reading along with this thread because I hate to see slowdowns. It's the primary motivation I had to purchase additional Hubs and join them via HubConnnect. :smiley:

I've seen "HubConnect" in a couple of messages within this thread along with the word "unlikely" - which is certainly my experience. BUT...

The ideas that are bubbling it's way to the top of my suspicions is two things... 1) the slowdowns are more than one problem. I think that the issue with C-5's and Netgear is a completely different problem than "a reboot fixes it' -- because I'm not seeing a C-4 vs C-5 distinction.. but maybe that's just wishful thinking on my part or lack of data points...

for 2) I'm seeing a lot of Cloud/Internet apps being mentioned. I know I spent a ton of time dicing up APIXU to be more 'friendly' with regards to hammering the Hub... I split the cloud connections up, made them async http, reduced the DB impact, etc. I'm more than please to see @mathew carry that across to DarkSky. But I wonder if Async Http and it's benefits haven't made it to all the most popular apps... yet. I wonder if there are loops that just pack too much into one continuous cycle. (Runs every 10 mins but the minute it does run is consuming 100%)

I know that HubConnect, will do Async Http BUT are people unaware of the Hub Impact to using http (oAuth) vs Event Socket?

EventSocket runs on your hubs 24x7 pumping out every event, as it occurs. You cannot turn it on or off because it's on all the time. If you also config MakerAPI-to-Homebridge and HubConnect-using-http (oAuth) then you've got double or triple the traffic for no benefit... or at least no outstanding (obvious) benefit.

If you can use EventSocket, it's your best solution because it is effectively 'free' -- it's running/outputing anyway - and specifically for HubConnect, there's nearly zero load on the sending Hub when using EventSocket.

For those of you using HubConnect and seeing slowdowns, please check your HubConnect configurations to use Event Socket if at all possible.

34%20PM

If you find you need to change, it's just a matter of selecting Event Socket on the server and copying the new key to your remote hub.

I just spent a couple of hours running packet captures of my HubConnect hubs, specifically watching for cloud traffic. I'm seeing some. It's extremely tiny.. a packet every 5 seconds average. It's hitting Amazon's AWS exclusively and I'm mentally labeling that traffic "hub checkin.' Where it's looking to see if there's a new update.

Let me be clear... I don't see the slowdowns.. I think there's a couple of reasons.. 1) I've been using EventSocket since the day Steve @srwhite wrote it. I've never gone back. 2) the cloud apps I use the most that I can edit, I've converted to async or, like APUXI, took a machete to it spread it's load across time.

At the same time, I'd just hate seeing slowdowns and am trying to see what's different for me. Remember, I saw slowdowns back in Dec-Feb of this last year/this year. It drove me to build an interconnected array of three hubs using HubLink/Link2Hub... replaced by HubConnect.

Splitting ZWave across two hubs made the most impact. Moving the Cloud apps to their own hub was a close second. Converting to HubConnect and especially/eventually EventSocket was a not insignificant benefit too. (I have a very tiny Zigbee network, and have never had a problem with it, related to slowdowns. I loath plug-in wall-wart repeaters that are necessary for Zigbee and have had too many occurrences of finding the wall-wart sitting on the floor because my kids wanted the extra socket for charging an iPad.)

I wish the same good fortune to all of you. :smiley:

6 Likes

Packet cap proof:

asa1# sh cap jpcap

1029 packets captured

   1: 20:47:58.022948       802.1Q vlan#1 P0 192.168.7.68.55927 > 18.223.183.64.8883: P 3173872143:3173872174(31) ack 2022206854 win 312 <nop,nop,timestamp 1053925432 3355598586> 
.
.
.
1029: 21:44:57.437492       802.1Q vlan#1 P0 192.168.7.68.55927 > 18.223.183.64.8883: . ack 2022210484 win 312 <nop,nop,timestamp 1054267373 3356460924> 
1029 packets shown

56 minutes, 1029 packets = 3.2 seconds per packet.

That's one packet every 9 seconds per hub (I have 3.) Meaning it's possible to have nearly zero cloud traffic on a reasonably complex system.

1 Like

My little C4 hasn't had any slowdowns that i've noticed (I have motion lighting - but never wait over a second for it to fire). I've never restarted my hub due to slowness, only ever to try and fix something i've probably done wrong.

3 Likes

I have a simple test Rule that I can tell when things slow down.

testsw

The 2 test switches are virtual switches configured with a 1 second turn off time. I then have them on a dashboard.

When either of the switches is pushed the rule triggers at some point. Then the if statement looks to see if the switch is on and either turns the light on or off. If the light doesn't turn on or off then the switch has turned off before the if statement executes.

After a reboot the rule always works flawlessly. I can watch the log and see that everything happens within about 200 msec. But over time that time gets longer to the point by the time the rule executes the switch is already off. I point out that the rule always gets triggered but sometimes it is far enuf past the time the switch is turned on that by the time the if statement executes the switch is off.

This has been very consistent. I need to start tracking how long or how many days it takes for the slowdown to affect it.

I was hopeful as this is my 4th day since a reboot, and I have never made it past 2 days ever since I first set up HE, but this morning one of my zigbee switches was changing faster than the rule was evaluating it, which resulted in an endless cycle of my kitchen lights turning on and off. I was able to pause the rule, and it took longer than usual to access the web ui. I waited a couple minutes and checked other automations, which are usually immediate, but now were taking several seconds to initialize, so I rebooted the hub and all is well again. I have definitely made progress, but obviously there was more than one issue at play here.

Sounds like a good use case for private boolean to keep the rule from being re-run if its already running.