Hub Crashes every night, Last ditch effort on Support

[quote="markus, post:10, topic:46393"]
and restore a backup after that?
[/quote] and make it a backup that is at least 3 months old now because there is a high chance that your database could be corrupted due to the Hubs power has been being pulled / shut off / reset.

2 Likes

If possible this would be worth trying, but since backups are database dumps, true DB corruption doesn't get exported. Corrupt rows will just not be exported since they can't be read by the dump process anyway. This by itself could be a problem, but also might not be.

3 Likes

Having similar issues on 2 of my hubs haven't narrowed it down to what time at night it happens but is always during the night when there is little to no automation happening.
One hub when it happens totally locks up every time the second one I can still get into via 8081 and do a command reset the odd thing is when the second one is locked up it appears to be in an odd state with the same one 1 light bulb (of a group of 3) on which should never be on by itself and was definitely off before the lock up.

Wondering if the lockups are caused by the auto backup.
Anyone know of a way to disable them to test it?
Since the backups do not occur at the exact same time every night how are they triggered?

There is no known official way of disabling them. As I suggested above, in order to have this happen around a time easier to track down, you can change the timezone and get it to a time when you are awake.
There are multiple maintenance tasks during the night, the exact schedule is not something I've seen published anywhere but looking at the DB backup time at least we know when the DB backup has finished. I usually experience a severe slowdown just after 02.10AM and have seen indications of there also being other tasks executed at night. As a question to the community: do we have the schedule(s) confirmed from official sources?

1 Like

Not that I can recall seeing recently.

2 Likes

If the data base backups are triggered remotely by Hubitat then simply killing the internet at night should stop them.
But I would assume the backups are triggered locally by the hub just not sure why they would be at different times.
Changing time zones will change them to a different time but will also affect some of your rules.
My third hub which only has 33 devices rarely if ever locks up.

@mike.maxwell
Is there anyway to disable or change the data base backup times?
Since we are all control freaks here it would be nice to be able have control of that function.

As the hub lockups appear to happen to most people around the 2-3 am time it is likely that one of the hub maintenance tasks is triggering the lock ups. (note I said trigger not cause)

I would remove all dashboards as a test. Those can make quite a lot of cloud traffic.

Yes, that's painful, but I believe you can copy over the css json before deleting to make it easier to re-create later (I haven't used Hubitat dashboards in some time, though, so could be wrong).

7 Likes

Copy the whole JSON and save it in a text file on your computer. To restore just create a new Dashboard and paste in the JSON overwriting the unconfigured Dashboard JSON.

6 Likes

what model hub do you have?

Have you done a backup and soft reset and restore? (I guess you did when you re-created everything?)

Do you run any weather devices? if so which ones?

  • have you set these to poll/update less frequently?
  • are your running the latest code from them?

Do any of the devices show 1000s of events when you go to

  • Devices -> select device -> events
  • I mention this as your initial log shows lux event (it looks like for the same reading)

Do you use Hubitat Package Manager?

Are you on the latest released firmware?

There will be, it's not released yet.
Backups are not triggered by the cloud, the are scheduled locally.

16 Likes

Great news.

Alright great information so far,

My dashboards are basically empty they have yet to be rebuilt since all this started so I am removing those I will also try to eliminate all power reporting options and will test tonight.

As for the model its the Rev C-5.

1 Like

To give additional indications of where to go with this, how about this question?

EDIT: I know there are a lot of questions in this thread, but if you can answer some more even if they may feel like repeat questions we might even find a direction to go from here. Everything has a reason, even though it is hard to debug without all information, we can all give it a go.
As for cloud devices/integrations (like weather services) do you have any that generate large amounts of events every day?

3 Likes

Alright, So there has been a little change today, now I am able to access the the 8081 management which was previously not accessible. Still the same crash at the same time and all the rules stop functioning.

As for questions.

Log size, Where can this be viewed at, as of right now my database backup size is 7.71MB

No weather devices,

As for events, Yes I have multiple devices with over 1000 events. I have a fair bit of sensors, I am still trying to go thru each device and reduce the logging, its a annoying process.

I use hubitat package manager but it was installed on the most recent rebuild it wasnt on when this started.

And yes I am on the latest firmware.

That is not a large database (mine hovers around 22MB), so you really will not have an insane amount of events in there unless they are all basically the same and the compression of the backup hides the fact that they are in there?

The actual logs don't go into the database, they're separate text files on the hub and are truncated at 250kb from what staff has said in the past. The impact on hub performance from logs is minimal, if any. Since I develop drivers I run many of my devices with an insane level of logging, probably generating 250kb of logs in 2 hours or less, I have no performance or crash issues due to this.

So far from what you are reporting it there is nothing pointing in a specific direction, yet. There is a device data cleanup that starts sometime just after 2.05AM (I checked more carefully last night, this is the timing I came up with) and can run for 10-15 minutes from what I have experienced uploading code at that time. I have a build-script that sends all changed drivers to the hub one after each-other, normally 30 drivers take 40-50 seconds or so, around this time it can take 5 minutes or more.

Have you tried putting your hub on a different network switch? I know this is a long-shot but some hubs have had crazy amounts of CRC errors (if you have a smart switch, check for those). This in combination with heavy load due to scheduled tasks and other things we don't know about how the hubs internals work could (though not likely) cause an issue such as this.

From my understanding you have done a complete reset and as such DB corruption should not be the issue, but you could download a backup and run a SOFT reset followed by DB restore and a complete shutdown and restart. I'll be happy to continue to run through other possible issues, but so far there are no strong indications of anything in any direction except the cloud connections you have already disabled.

Except he keeps constantly power cycling this with a WiFi plug according to his first post.

1 Like

True, time for downloading all backups and a Soft Reset then. Now that port 8081 seems to be working now need to power cycle. Good to do to at least eliminate this as a contributing factor.

2 Likes

Yah it went down again tonight after reducing the events, so I really dont think that is the issue.

Im going to try tonight to put it on a different network switch, if that doesn't work than Im going to plug it in directly at the modem and bypass the stack.

I really dont understand what else it could be but im probably going to have to look at another controller to at least keep the important things up and running while I try to figure this out.

Have you tried this yet?

1 Like

I just performed a soft reset with and loaded with a backup, having the hub on a new switch did not resolve the issue. I am going to changes its IP and see if that fixes it.