Well, I'm happy to say that I know exactly why my C5 hub dies/loses communication. It is definitely the combination of massive # of events being created (because of my MQTT testing) and using the HubConnect client via event socket.
I can reproduce "at will" (if "at will" means reboot the hub and wait 3-6 hours). Today it took from 8:30am to 2pm to die.
It is interesting to me that I have the exact same HubConnect connection going to a C4 hub, and it does not die. Just my C5 hub.
What I have architecturally:
RadioHub1 (C4 with USB radio stick) <--> Coordinator Hub (C4 without USB radio stick) <--> RadioHub2 (C5 - the one that dies)
For the record, I don't know for a fact that it "crashes", although based on looking at the logs after reboot I am pretty sure it does. I do know that when it "messes up" I can't communicate with the hub on port 80 or 8081, and have to hard power cycle it.
@mike.maxwell@bravenel If any of that is useful for you in testing, let me know. If not, I'll just undo the HubConnect on that hub / change it from event socket to HTTP and move on with my life.
No hurry / worries. The hub that dies on my end isn't doing anything important right now. It is in a holding pattern waiting for me to add some osram bulbs to it.
So if there is anything you want me to try/data to collect from it, it is available for the tinkering.
Whilst my MQTT app never envisaged anywhere near that level of incoming messages the way HE implement the MQTT driver it helpfully queues messages and awaits parse to complete before calling it again. I do subsequently throw an event to handle the parse but I can see that they are generally being handled, on my system, without backlogging, rougly 80mS each but I'm guessing 10/second would start problems.
What will be happening though is that the MQTT buffer within the driver will grow on HE (I cant query how many messages are pending) and eventually problems (memory) must materialize and horrible delays in updates - Jason has mentioned 45 minutes.... I'll see what I can do in my alpha5test topic to ignore more updates
First, I certainly don't mean to insinuate that there is a problem with MQTT or hubconnect. The way I have things setup right now would not be how a normal person would implement in production. This is for testing.
A couple of salient points:
My MQTT broker is getting 50-70 events/second from Home Assistant into the HA StateStream topic. That is all messages (state, timestamp, other attributes, etc).
On the MQTT logs in HE I see an average of 5 messages/s being logged - so the MQTT driver does a pretty good job of filtering out the non-necessary MQTT topics/pubs/subs.
The C4 hub with MQTT installed is not the one that crashes or locks up. The hub with MQTT is running fine, and runs fine 24/7 even with the existing messaging load.
It is the C5 hub that is connected via HubConnect socket connection to the MQTT hub that is dying - which makes it seem like it is specific to the C5 device itself.
I have the exact same HubConnect socket connection going from the MQTT hub to a different production C4 hub, and it does not die or lock up. There again pointing to the C5 as the different variable.
I have about 3 updates /sec into my app - I have never seen a slowdown or the queuing building up. I lightly use HubConnect.
I have never experienced slowdowns on any of my 4 hubs + ST ... but .. my hubs are all C4's and I do not have large Z networks and I don't use RM very much at all either. Just a lot of events being generated..
I was just briefly in the locked up C5 hub club. Missed mode changes, slow rule execution, hub completely stopped for hours before I notice and had to reboot. Turned on the logging options in the last few rules I'd modified and discovered an accidental rule recursion. Fixed the rule and now I'm out of the club!
Didn’t mean to infer that your app was at fault...
Quite the opposite. I was busting on JasonJoel.
I did quite a bit of benchmarking and for obvious reasons the size of the message payload is inversely proportional to throughout. For HubConnect I had no issue pushing 10 messages through parse() with no noticeable slowdown. But those were small websocket messages which are only 300-500 bytes, small enough for a single TCP packet.
When I was trying to develop a connector to listen to my UniFi controller eventsocket the hub was slowing down with just 2-4 messages per second. But the messages were several KB. I suspect the biggest culprit was parsing the JSON.
He’s a special kind of special.
I think that has got to be a record.
I have traced the slowdowns (in my environment) to code that make extensive use of HubActions which seems to be very heavy calls (versus http). I set up a test where I was sending a HubAction with UDP payload every 1/2 second and saw the hub to slow down to the point where other actions were lagging 2-3 seconds. Even sending a HubAction every second had a noticeable impact. In contrast I don’t see the same slowing sending httpGet calls at the same frequency.
Not saying that’s the root cause, just some feedback from my testing.
I agree above 10/sec events things start to backlog.. but I think I can handle that - at least as long as it's a temporary situation.. If it's sustained the future is grim...
No argument there. After more testing I did go back and excluded the chattier devices, not I'm at a super boring event rate of <5/s. Lame.
Side note, my C5 hub crashed again - even with HubConnect set to HTTP auth and no devices being passed. Even though I don't think HubConnect had anything to do with the crash, I removed it (again) and if the hub stays alive until tomorrow night I'll add it back on and see what happens.
If it makes the situation any easier, my server hub (C-4) died overnight. It powers up an the LED turns blue but it never goes any farther and I cannot access the :8081 tools.
It does happen. No hardware lasts forever. Although in my case, 7 months is pretty short.
The main soul searching I need to do today/this weekend is whether to get another Hubitat hub, or put the devices I had allocated for that hub onto my Home Assistant machine.
I've been doing more and more in HA, as it is just much more flexible and considerably faster (it is running on faster hardware, so that's to be expected). The downside, obviously, is it is more complex, more technical, not as easy to use, and zigbee support is all over the place (3 different options, each with their own caveats and pro/con).
I wonder if it’s a power supply issue? Maybe swapping it for another (if you have a similar one) to see if that resolves it. I remember somebody else mentioning that they had a bad power supply.