After update to 2.1.9, Hub crashes overnight

I have hub watchdog running and between 2am and 2:15am every day my hub goes very slow.
Hub Watchdog shows timings of around 0.08 to 0.15 pretty consistently but when I get hub slow downs around the time mentioned above, then HW can show figures between 2secs to 15 secs.
This leads me to assume that the hub maintenance is doing something pretty resource hungry that slows down the hub.
I have asked before in previous posts if anybody knows 'exactly' what happens in the maintenance window but have had no replies to it.

I know this doesn't help you with your issues but it may indicate that 'nightly hub maintenance' could be causing it. (Just my assumption and I'm probably way off).
How big is your backup BTW? Could it be something to do with the size of the backup that causes issues during this time window? Again, with no information it's difficult to know.
My backup is 37Meg. C-3 hub.

Good question. I forgot about looking at my automatic backups to see if they were related or could offer a clue. Interestingly, my backups have typically been around 3am. But one was 1pm and, on the two days after the hub crashes, were right after the reboot, around 7am.

Size is 45MB.

I am also using Hub Watchdog which has been enlightening - lots of slow responses around 2am.

1 Like

Hmmm. Looks like there is something going on at that time.
My nightly auto backups are at 03:05.

I'm going to set RM to reboot my hub at 1:30am to see if a "fresh" hub can withstand whatever happens at 2-3am.

1 Like

I manually rebooted my hub at 1:28am. No problem.

At 2:06am, everything died. I infer this from the fact that, this morning, I had to access it from the 8081 menu, revert to 2.1.8, and look at the past logs. Nothing in the log entries after 2:06am until I rebooted it. Note that, unlike the past two mornings, I could not reboot from the 8081 menu, so I took the reversion option.

The automatic backup, normally in the 3am timeframe, did not happen. Something is happening around 2am that totally kills my hub. Now, I'll see if it happens tonight on 2.1.8.

I don't know. It sounds hopeful, but I'm staying with 2.1.8 to see how that does. I'd like some assurance from HE Support that this hotfix does address my issue. I've sent another email to Support to ask that question, but I don't have any expectation that my third email will get any more response than my first two (for a critical hub issue).

Perhaps ask that question on the hotfix release thread.
Just thought I'd point it out as there are fixes for locks by the looks of it.

It says right there in the release notes that it does. You can always roll back. It doesn't hurt anything to try.

1 Like

You are almost certainly correct, but after three nights of a non-performing hub I need to be sure. I can infer that the fix addresses my issue, but I'd like Support to respond affirmatively to my open ticket. On the small chance it doesn't work, then I do suffer some hurt. A fourth night and the WAF will be irretrievable.

My C4 hub crashed once a week when my file is around 45 Meg. You have way too many apps or rules. I moved most of my RM, ML, SL and around 50 devices to another hub and got the file down to around 25 Megs. Crash free for a few months now.

Now that is very interesting and might provoke quite a few replies / comments.
Could also relate to rewritten rules etc

@chuck.schwer

My hub is now unresponsive for the fifth morning in a row.

Sun: dead
Mon:dead
Tue: dead, downgraded to 2.1.8
Wed: dead, upgraded to 2.1.9.117
Thu: dead on 2.1.9.117

I am very understanding that this is, as the title of another thread says, "not ready for mainstream". I said there that for me, this is a hobby.

However, support is abysmal. I appreciate that you come onto the community forum and post, but that does not substitute for real problem support. I have a ticket that I opened Sunday via email and have sent three subsequent emails to update the failure. Your response here and @mike.maxwell's reference (to a similar problem as mine) in the release notes for 2.1.9.117 are the only acknowledgement of a problem that may, or may not be, my issue. This has left me to guess whether various downgrades and upgrades will help. They have not.

You should always tag @bobbyD regarding ticket issues or PM him directly. I'm sure he will reach out shortly.

2 Likes

Sorry for the delays. I checked your ticket and realized that we have a problem in our prioritization process. Every new comment to a new ticket updates the timestamp of the ticket making it look like the ticket was just created. Sorry for the inconvenience. I took the necessary measures to avoid this problem in the future.

After being a long time hold out to upgrading, I will say that the latest build seems to be very quick and, although it’s a little premature to say this, it does appear to be stable for me.

This wasn’t my initial experience, although I did not experience a freezing hub at any point, I had experienced some slowdowns. Working with Bobby, I identified that on my system anyway, the Chromecast Integration (beta) was causing an issue. And it seems an old Z-Wave home energy monitor was also a contributor. Both of these are on me really. I had been warned that there were potential issues with running that beta software. And I had been warned that using two Aeon Z-Wave home energy monitors on a single hub could cause problems.

So my point of all of this is, it’s quite easy to just say it’s not ready for prime time, but I think that the majority of the time it does turn out to be things that we have done to our own hubs, or a misbehaving device. In my case it seems to have been both. I’m not pointing fingers, but I am suggesting that we all lower ours and stop just deciding it’s “this one thing”. This hobby is about taking a mix of devices from different manufacturers, software from different sources, and very often software that we are warned may have issues and yet we proceed because it’s cool. With a mix like that alone, it’s a recipe for instability.

I like how @stephack described it in another thread.

And although I am replying to you, I am certainly not singling you out. You’ve got the right mindset about it being a hobby, and your wife is tolerant as you mentioned, which is always helpful. I think far too often wanting to find a single cause, blinds us to what the real problem and solution is. I am absolutely guilty of doing the exact same thing. :v:t2:

The issue described in the other thread and fixed in the hotfix are referring to an issue of loading cloud dashboards but everything else works. Your issue is different and does not appear to be related to the update since you said that you have rolled back and are experiencing the same issue of the hub being dead overnight.

1 Like

@bobbyD did contact me directly and gave me some steps to take. I've done a soft reset and restore. Plus, I've disabled many of my custom drivers and apps (including, as suggested by @SmartHomePrimer, an old energy meter and the beta Chromecast which I use a lot).

2:06am is the magic moment so I'll see what happens tonight.

1 Like

2~2:40 AM timeframe for me is still a thing I have to deal with. I did some testing this morning at 2:35 AM and as soon as I turned off the zigbee radio, the interface sped up, and all of my lights that are connected by IP through the Aqara Home and the hue bridge started responding instantly. I can’t be 100% sure yet that it wasn’t a coincidence and that the maintenance wasn’t just simply finished. So I’ll need to test again, but if that’s it, then it’s probably one of my Zigbee devices causing a problem. I still have some Xiaomi devices on the HE Zigbee network and that could very well be the problem there.

Did you ever find a way to fix your hub crashing at 2:15?

If you are having this issue, I would suggest starting a trouble ticket by emailing support@hubitat.com .

I don't think this is a widespread issue, so it probably is something unique to your hub or something running on your hub.

In the mean time, are there any errors in the logs or anything else you can post for us to go on? Have you tried any steps on your own to solve this? Maybe even just post the logs for the time period just before the crash.

Edit: I see you started another thread about this. You probably should try to keep all information on one thread so people aren't duplicating efforts.

3 Likes