Hub Crashes every night, Last ditch effort on Support

For the past 3 months my Hub has crashed every night at or around 0215, I am going to try to provide as much detail as I can below. Im looking for any support, as official support has had me disable so many features and its still crashing.

The first month I was using a separate Wifi plug and was restarting it at 230 to bring it back online. I decided I wanted to fix the issue permanently and find the root cause. Before I contacted support I did the following in roughly in the order posted.

-Removed and updated all apps that had any errors in the logs.
-Saw a post about zombie devices, so I purchased a zwave stick and removed 2 zombie devices.
-Wiped the hubitat, Removed all apps drivers, deleted both networks, removed all rules.
-Factory Reset all 32 zwave and zigbee devices, I firmware updated any that allowed OTA before adding them back to hubitat.

  • Re added everything back, remade all my rules, etc.

At this point I decided to contact support below our my interactions with official support and the suggestions in order.

First I was told that the Wifi Devices using community written apps could be the problem. I removed all my wifi devices and the problem still existed.

"Thanks for taking the time to reach out to us, we're here to help! First of all, I am sorry to hear that you have taken the "nuclear" measure to start over. That is rarely the best solution to resolve a problem and only recommended in extreme situations. I took a look at your hub's error log and the problems you've been experiencing appear to be related to some custom LAN or cloud integrations. I have seen these kind of problems with several community integrations such as TP-Link/Kasa integration."

Second Response was that my Network setup could be blocking the ports to communicate out.

"There isn't much in your hub's error log, but I see an issues related to your hub's Cloud Service that happen regularly. This is indicative of something within your local network blocking the hub's access to the cloud. If you have services using the cloud, the hub may go unresponsive at times. Can you tell me a little bit about your network set up (router, firewall) and Internet Service Provider?"

I asked if I could get some more details on what ports and protocols are used and was told that its just 443.I have a background in networking and I verified my network gear (Unifi) was configured properly, I ran a bunch of tests and verified that there was nothing blocking 443. There was no changed, at this point I asked if this could be hard ware related and asked about the only app that I had that used any remote connectivity.

"Thank you for the update. If you are able to access the hub's web interface but the hub cannot stay connected to the cloud, it is software related not a hardware issue. As for our Hue Integration, it doesn't involve cloud, it connects the Hue bridge via LAN and rules run local. I suggest disabling your custom integration and trying the built-in one."

I disabled the COCO app and it was still happening, my next reply

"Thank you for your update. Your issue has been referred to our engineering team. Further replies to this ticket will not receive a response."

I did receive another reply

"Looking at your hub's error log I noticed that prior to the time-frame you indicated, your hub's cloud controller was "stuck." Restariging it requires a reboot. The reason it was stuck is because you have a heavy taxing application that uses the cloud. I have seen this happening with some users who have remote dashboards always on using the cloud link. This is something that has been referred to our engineering team, but I do not have a time-frame when they will be able to resolve it. Meanwhile, reducing the cloud intensive applications may prevent lockups. I would start with dashboard links to ensure that you are using local link rather than cloud link where feasible. "

I removed the cloud links and its still happening.

So as of right now, I cannot use any cloud links, any apps that use web connectivity, and I still need to reboot the device often and Im really at a loss on what to do. I really loved the hubitat, I have it controlling so many devices in my house but now its becoming more of an annoyance.

Please if anyone has any idea on what I could try please reply or message me. I am thinking about purchasing a new hub but Ive already spent days rebuilding it.

Ok for a first step, you should never pull power from the hub like this. It corrupts the database and causes issues just like you are having. If you must reboot or shutdown, then do it through the software, in the Settings tab. If you want to automatically reboot, then there are apps that can do it for you. But generally speaking, rebooting should not be needed regularly.

I gave some advice to your other post about uploading screenshots of logs. That would be helpful for people to see if there is something unusual.

And lastly, it is rare to have to wipe the hub and start again. I would only do that if support indicated that was needed. A soft reset might be needed on a rare occasion, but even then there are steps to take to ensure you aren't causing more harm than good.

Maybe I didnt explain it well but the hub is unresponsive when this happens. The automation stops and you cannot access the hub at all. Using the app or navigating to the site times out. My only option is to turn it off and on.

Here are snippets from the logs.

dev:16442020-07-15 09:39:31.238 infoTelnet connection to Lutron interface established

dev:16012020-07-15 02:14:30.830 infoKitchen Under Cabinet is on [digital]

dev:14542020-07-15 02:14:30.103 infoBack Porch Sensor illuminance is 10lux

dev:14542020-07-15 02:14:30.010 infoBack Porch Sensor illuminance is 10lux

systemStart System startup with build: 2.2.1.116 2.2.1.116 2020-07-15 09:39:33.837 EDT
ssdpTerm urn:schemas-upnp-org:device:basic:1

app:13462020-07-28 02:11:15.041 debugSending request for Bridge information

app:13462020-07-28 02:11:14.053 debugSending request for Bridge information

App 1346 is CoCoHue - Hue Bridge Integration (7C21B3 - Philips hue)

dev:19242020-07-30 02:16:25.084 infoLutron Controller power is 0W

dev:19232020-07-30 02:16:25.048 infoCommandStation power is 32.8W

dev:18912020-07-30 02:16:11.372 infoHue power is 2.9W

dev:18902020-07-30 02:16:11.131 infoGarage Fan power is 0W

dev:16972020-07-30 02:16:03.867 infoDownstairs Thermostat fan is auto

dev:16982020-07-30 02:16:03.866 infoUpstairs Thermostat fan is auto

dev:16982020-07-30 02:16:03.346 infoUpstairs Thermostat mode is cool

dev:16972020-07-30 02:16:03.233 infoDownstairs Thermostat mode is cool

Last night the very last events were below
Event Details

id: 5990702
Date: 2020-08-01 02:11:13.000
Name: ssdpTerm
isStateChange: false
source: HUB
value: roku:ecp
hubId: 1
systemStart System startup with build: 2.2.2.129 2.2.2.129 2020-08-01 10:40:38.889 EDT
ssdpTerm roku:ecp 2020-08-01 02:11:13.554 EDT

10:40 was when I rebooted it today.

Screenshots are much easier for everyone to read, so if possible please try to use those. Support will request you do so, might as well get used to it.

What I notice is it appears you have power reporting plugs on lots of devices? How many do you have, and what are they set to report? Excessive power reporting can bring a hub to its knees in short order.

One last thing to check, do you have all your smart home devices (Hubitat, Lutron, Hue, Alexa, etc) all on static or reserved addresses? If smart devices cannot find each other due to their IP address changing, it will cause even more issues on top of the ones you are having. In fact it might even be part of your troubles.

There are other options. Again do not ever pull the plug. You are making it worse by doing that. The first thing would be to try if you can reach the diagnostic port of your Hubitat (http://your_hubitat_ip:8081). There is a reboot option in there.

And just so it is on his radar, lets tag @bobbyD

2 Likes

I’d just add that it might help to know what β€œ 32 zwave and zigbee devices” you have.
Qty of each type and make and models.

2 Likes

"There are other options. Again do not ever pull the plug. You are making it worse by doing that. The first thing would be to try if you can reach the diagnostic port of your Hubitat (http://your_hubitat_ip:8081). There is a reboot option in there."

  1. The hub is in accessible, there is no option to access it. The hub will not receive any network traffic, I send a ping and it drops. Tracert and the packet moves thru my network stack but fails at the device. The activity inside my networking gear reports 0% up and down on that port. Every device in my house ip is static.

  1. How would power reporting cause the hubitat to crash? If the logs are filling up than why would it crash every day at 2:15. Even if I reboot the hubitat now it will crash at 2:15.

Here is my latest response from Support

What Im trying to understand is if anyone else has encountered anything similar to this and if so was there any solution. This is something that started 3 months ago, on a system that was setup in January 2019 when this started happening no new devices were added.

I can provide as much details needed please let me know what exactly you would like to see.

Do you see any spikes in traffic on the Unifi controller around that time?

Have your ever sat and watched your hub logs around 2:15? I'd be glued to my computer around then if I had a defined time like that.

1 Like

When this happened to me, couldn't reboot due to locked or non accessible hub, several times, I setup my button controller to reboot the hub when i hit button 2. It's a z-wave minimote and seems to always work when I can't access hub or diag URL

1 Like

Considering this is happening around a scheduled maintenance time (if I'm writing code and saving it to the hub around this time it is SLOW or even unresponsive at times...), just to be certain it is related I would change the timezone of the hub to one so that the 2.15 AM local time would be 8.15PM (-6hours from your current timezone) or some other easy to monitor time . Outside of that I'm not certain from what was written above, but did you run a Soft Reset (NOT Full Reset) of HE and restore a backup after that?
How large are your backups?

4 Likes

It could be that the schedule maintenance process is the final straw bringing your hub to its knees.
My guess is if it's the same time each day therer is something that is resource heavy and your hub is tipping over the edge at this time. I would start with the power monitoring and disable/remove those.

1 Like

[quote="markus, post:10, topic:46393"]
and restore a backup after that?
[/quote] and make it a backup that is at least 3 months old now because there is a high chance that your database could be corrupted due to the Hubs power has been being pulled / shut off / reset.

2 Likes

If possible this would be worth trying, but since backups are database dumps, true DB corruption doesn't get exported. Corrupt rows will just not be exported since they can't be read by the dump process anyway. This by itself could be a problem, but also might not be.

3 Likes

Having similar issues on 2 of my hubs haven't narrowed it down to what time at night it happens but is always during the night when there is little to no automation happening.
One hub when it happens totally locks up every time the second one I can still get into via 8081 and do a command reset the odd thing is when the second one is locked up it appears to be in an odd state with the same one 1 light bulb (of a group of 3) on which should never be on by itself and was definitely off before the lock up.

Wondering if the lockups are caused by the auto backup.
Anyone know of a way to disable them to test it?
Since the backups do not occur at the exact same time every night how are they triggered?

There is no known official way of disabling them. As I suggested above, in order to have this happen around a time easier to track down, you can change the timezone and get it to a time when you are awake.
There are multiple maintenance tasks during the night, the exact schedule is not something I've seen published anywhere but looking at the DB backup time at least we know when the DB backup has finished. I usually experience a severe slowdown just after 02.10AM and have seen indications of there also being other tasks executed at night. As a question to the community: do we have the schedule(s) confirmed from official sources?

1 Like

Not that I can recall seeing recently.

2 Likes

If the data base backups are triggered remotely by Hubitat then simply killing the internet at night should stop them.
But I would assume the backups are triggered locally by the hub just not sure why they would be at different times.
Changing time zones will change them to a different time but will also affect some of your rules.
My third hub which only has 33 devices rarely if ever locks up.

@mike.maxwell
Is there anyway to disable or change the data base backup times?
Since we are all control freaks here it would be nice to be able have control of that function.

As the hub lockups appear to happen to most people around the 2-3 am time it is likely that one of the hub maintenance tasks is triggering the lock ups. (note I said trigger not cause)

I would remove all dashboards as a test. Those can make quite a lot of cloud traffic.

Yes, that's painful, but I believe you can copy over the css json before deleting to make it easier to re-create later (I haven't used Hubitat dashboards in some time, though, so could be wrong).

7 Likes

Copy the whole JSON and save it in a text file on your computer. To restore just create a new Dashboard and paste in the JSON overwriting the unconfigured Dashboard JSON.

6 Likes

what model hub do you have?

Have you done a backup and soft reset and restore? (I guess you did when you re-created everything?)

Do you run any weather devices? if so which ones?

  • have you set these to poll/update less frequently?
  • are your running the latest code from them?

Do any of the devices show 1000s of events when you go to

  • Devices -> select device -> events
  • I mention this as your initial log shows lux event (it looks like for the same reading)

Do you use Hubitat Package Manager?

Are you on the latest released firmware?