Model C-5 Resource Leak?

kv1 · December 31, 2019, 9:54pm

I use both Safari and Chrome.

JasonJoel · December 31, 2019, 10:02pm

Well, now I have a C5 device with ~7 zigbee devices (all in-box drivers) and 1 app (Rule Machine with one rule - the rule to reboot it daily) that has become non-responsive 2x today (couldn't connect to web interface or 8081 port).

And that was after it was just reboot at 4am this morning - so it hadn't been running for long.

If it happens again I guess I'll submit a support ticket....

mike.maxwell · December 31, 2019, 10:25pm

so the only apps and drivers ever installed on this hub whilst you were having issues were those 7 zigbee devices and RM?, Is that correct?

JasonJoel · December 31, 2019, 10:31pm

Sorry - one thing I should have mentioned... Up until this afternoon it did have the hubconnect client, too. After the last lockup I removed that as well, though, to get it 100% stock / in-box drivers and apps.

Currently it has the following. There isn't anything important installed on it right now (have some osram outdoor lights coming that I was going to sequester to this hub), so is free to tinker with if needed:

.

JasonJoel · December 31, 2019, 10:36pm

Both times it died today it was within 3-4 hours of the last reboot, for what that data point is worth.

I would see HubConnect errors for a few minutes, then it just froze up.

On the 1st freeze, it was reboot at 4:11am, and dead by ???. I noticed it was dead @ 7:25am, so it was sometime before that.
On the 2nd freeze, it was reboot 7:29am, and was dead by 10:45am.

Last reboot was ~1 hour ago, so I guess we'll see what happens!

mike.maxwell · December 31, 2019, 10:47pm

Yeah, not pointing the finger at hub connect, but just about anything generating errors on a continuous basis isn't going to go over well long term.

As you know I have a dedicated Zigbee hub running the worlds worst zigbee router collection of bulbs (about 30 if them), these are mirrored to and from two other hubs using an HTTP sync app that I wrote specifically for bulbs only, anyway last long term run I had on that hub and the others it was connected to was almost two months, I had to cancel the testing eventually to get on with platform version testing...

Point being, I didn't notice any slowdowns of the three hubs that were part of that testing on 2.1.6. Both C5's and C4's were in that mix.

JohnRob · December 31, 2019, 10:48pm

As I read the reports of others having slow down issues I recall my hub was getting slow before that last wave of firmware updates. But the slowness crept up on me so I didn't notice it right away. Now its been a week since I went to 2.1.8.106 and the hub is still responsive.

Thinking back.... I might loosely describe my experience as:

Slowly slowing over a matter of weeks.
Back to fast when firmware update.

My reason for writing this is, others seem to have the slowness build up quickly. I don't but it is likely is building. I connect to the hub very seldom and not for extended periods of time.

Could my lack of connected time be significant?

JasonJoel · December 31, 2019, 10:48pm

I agree with that. In this case, it had no errors at all for hours,. and then showed a bunch of errors for 10-20 minutes and then the hub died.

Like this (this is literally the entire event log from 7:29 to 10:43):

dev:3222019-12-31 10:43:37.599 am infoAttempting socket connection to Server Hub (0)

dev:3222019-12-31 10:43:37.585 am traceInitialize virtual Hub device...

dev:3222019-12-31 10:43:27.463 am warnConnection to Server Hub has failed with error [failure: null]. Attempting to reconnect...

dev:3222019-12-31 07:29:49.099 am infoConnected to Server Hub

dev:3222019-12-31 07:29:48.067 am infoAttempting socket connection to Server Hub (0)

dev:3222019-12-31 07:29:48.006 am traceInitialize virtual Hub device...

Did the app cause the hub to die, or did resource exhaustion cause the app to die? The world may never know.

Now, if it dies AGAIN now that there is zero user code... Well...

mike.maxwell · December 31, 2019, 10:54pm

it's not going to die just sitting there, jesh...

Hopefully we will be able to get something reproducible for this specific case anyway.

That would be super helpful, as up to this point we've not been able to reproduce this experience with the setups that we run in our labs or homes.

None of us have any issue installing user app/driver xyz if that leads to a hub slow down/crash.
But so far nothing concrete has been determined as to what that setup is.

JasonJoel · December 31, 2019, 10:56pm

Hopefully not. If not, then I'll add HubConnect Client back on, wait a few days, and see if I can get it to die again.

It probably won't die again - that seems to be how these things go... But I thought a step by step process would be useful just in case I can get it to do it again. And I don't really need the hub until Sat or Sun anyway.

csteele · December 31, 2019, 11:05pm

Those HubConnect errors are a result of the 'ping' failing -- aka: no network connectivity. At least for that portion of the logs, they are a symptom, not the disease.

Non-responsive port 80 and port 8081 -- certainly HubConnect would need port 80 functional.

Clearly there's no way to determine cause and effect... was the failure of port 80 due to a resource impact elsewhere and thus no packets were entering or leaving the otherwise good network connection? or did the network stack fail and thus nothing else could use it either.

The symptom related to the Network interface (100/full) issue was that the automations worked just fine because the errors are at the NIC level.

JasonJoel · December 31, 2019, 11:09pm

Yup, it would.

All 3 of my HE hubs are physically connected to the same switch, too, so I don't think it is an external networking issue... Could be wrong, but since the hub works fine after a forced reboot I don't think I am.

csteele · December 31, 2019, 11:13pm

I'm way over here throwing darts at the "Wild Guesses" dart board... But the fact that 8081 went away too is at a whole 'nother level based on how the Diagnostic Menu is wholly separate.

The combo suggests that the Hub's network stack got 'severed' - either by no memory or a semaphore. And my darts are landing closest to semaphore

rjterry21 · January 1, 2020, 1:12am

I've had the slowdown issues and never had hub connect installed.

jabecker · January 1, 2020, 1:51am

I've had slowdown issues. I had hub connect installed. The slowdown didn't start until after I uninstalled it because I no longer needed it.

JasonJoel · January 1, 2020, 4:43am

Well the hub made it 7 hours with hubconnect client removed, so I added it back on.

We'll see what happens. My guess is it will run fine now. LOL. Which I guess is a good thing.

srwhite · January 1, 2020, 5:15am

This is one of the risks of using the Hubitat websocket and why the proxy server is being developed..

Check this out... This is Hub 1, which is my 2nd floor hub. 95+% of device activity flows outward to to the server. Look at the proxyEvents metrics which are over about 31 hours since I last restarted the proxy..

1,190,830 events received by the proxy from the server hub websocket but only 510 events forwarded to the Hub 1. Thats almost 1.2 million calls to parse() that have been avoided.

If you keep having Hub slowdowns I would suggest switching over to http for a while. I suspect event saturation on the websocket might be dragging the hub down.

Oh, and Happy New Year!

JasonJoel · January 1, 2020, 5:21am

Good comments, and duly noted! And that may very well be it.

I'm testing the alpha MQTT app, which has made my event count go up (literally) exponentially the way I have it currently setup. My MQTT broker generates ~70 messages/s (3-5K/min, ~200K/hr, ~5M/day) Many of those end up creating events the way I have the MQTT app setup right now.

So it is certainly feasible that it is slowly chipping away at the resources on the remote hub, or falling behind and causing issues. Using HTTP for the HubConnect connection would likely reduce loading dramatically.

And Happy New Year to all as well!

srwhite · January 1, 2020, 5:27am

If that is 70 calls to parse() in the MQTT device on the hub, that is likely more than the hub can safely handle. In my stress testing of HubConnect when I developed the websocket I started to see the hub slow down before I even hit 20.

JasonJoel · January 1, 2020, 5:29am

I believe it is. And then a subsequent ~70 event handlers/second on the parent app the driver sends the events to after receiving them from the MQTT broker.

Probably not a very nice thing to do to the hub.

The MQTT app can also be setup similar to HTTP on HubConnect, which is what I will ultimately do I'm sure. I was just testing the 'out of the box' automatic HA to HE integration. But I don't think that is going to work for my system. Just too many messages to parse out/filter/act on.