My hub db seemingly got corrupted

dman2306 · June 18, 2021, 2:01pm

I've never seen anything like this happen nor read anyone else saying it happened to them. I woke up this morning and noticed some automations scheduled to run at 2am did not (ones from 12am did). When I logged into my hub I got:

That error was common years ago but I probably haven't gotten a 500 error in 1-2 years. Anyway, I rebooted. When it came back online I was presented with this:

The hub completely corrupted its db?

Note the size difference between the last two backups

Restoring the backup brought it back online, I just lost of a ton of work unfortunately.

Anything you guys can see in the logs to understand what went so wrong?

@bobbyD @gopher.ny

gopher.ny · June 18, 2021, 2:37pm

I see a bunch of these starting overnight:

LanControllerJetty - Received data from 192.168.1.180, no matching device found for 192.168.1.180, C0A801B4:CF3A, DCA6322B6490 or C0A801B4.: /notifynull

That could be a symptom, though, not a cause. Not much else is going on.

The work you're referring to is apps/drivers source, right? I don't have an good idea what caused database corruption yet, but I can think of ways to create additional backups for high value data (like source code), which would be turned off by default but can be turned on by developers.

dman2306 · June 18, 2021, 2:39pm

Hmm that's interesting. That is a device I had setup yesterday (it's my Alarm System).

No actually what I lost was I'm manually migrating from a C5 to C7 (because the hub migration never worked for me) so I lost the migration of about 100 devices I did yesterday. That's my fault, I should have taken a backup afterwards, just sucks.

For source code, I always work in github first and paste into HE so I've never lost source code personally

gopher.ny · June 18, 2021, 2:47pm

I don't see a way to recover the data, but would a popup/reminder to take a local backup after, say, 20 new devices are added since last backup be helpful? I can add the check to device list page. Same goes for apps, large number of changes would trigger a popup suggesting a backup and providing a one-click way to do it.

I know this is a workaround at best and not a solution to the underlying problem.

hydro311 · June 18, 2021, 2:48pm

I had this "Error 500" happen last week out of nowhere - I hadn't made any changes/tweaks to my setup around that time, so I have no idea what caused it.

I don't think it's just my imagination, but it sure seems like there have been a lot of "trouble with database" posts in the last several weeks...

When mine happened, I found an active post where gopher.ny suggested a soft reset and limiting log entries to 11. After doing all that, everything has been fine, but it does make ya wonder!

bobbles · June 18, 2021, 2:52pm

If I may comment on this with nothing to offer to the problem, sorry, what a great idea.

bobbyD · June 18, 2021, 3:24pm

Although the error is the same as the one from years ago, its meaning is totally different >> it just means that the page you are trying to access is no longer there. It could happen if you removed an app/driver, then try to access the old id, or it could happen if the db is corrupted and its tables are not available. The former is easy to fix, access a valid id , the latter is also fairly easy to fix by running a Soft Reset with restore.

1449smarthomeautomat · June 19, 2021, 3:55am

Same thing happened to me last weekend. I had to do an unexpected emergency backup and restore. I too had not made any recent changes and it hasn't happened again.

Mr.Olsen · June 19, 2021, 9:59am

The same here, last week after getting up in the morning I found my hub in exactly the same state, must have died in the middle of the night.
Didn't change anything before.
After rebooting using the diagnostic tool it seemed having done a soft reset, restoring a backup brought it back to life.

After that I closely watched the logs, but everything seemed fine again.

The logs showed this before rebooting, these errors seemed to appear for any active device.
This is from an Aeotec Tri-Sensor EU:

Edit: C7 @ 2.2.7.126

1449smarthomeautomat · June 19, 2021, 4:14pm

Wait a sec.....I just added two of these devices a week or so before this happened to me. Suspicious. I don't have the logs from last week unfortunately but this is definitely something for me to keep my eye on. Mine happened in the middle of the night too.

Mr.Olsen · June 19, 2021, 5:59pm

My Trisensor was added about 6 months ago and did a good job since then, I dont think that it it is responsible for this.
As I said, this error occured to all devices that have been active during this period of time, no matter if it's Z-Wave or LIFX.
My personal and lay observation lets me think that it has to do with the scheduled database maintainace that seems to run every night.