Power cycling corrupting databases...?

To some this may sound like fighting words, but i am not intending to be disrespectful. I am genuinely interested in the answer.

I keep on hearing on this forum "don't yank the power cord -you could corrupt databases". Is that really true or is that an urban legend? =

From my viewpoint I am surprised, maybe even a bit skeptical that this is actually true. Sure i'm not disputing that there may be corner cases - but in general is the platform not designed to be crash consistent?

The quality of the hubitat product seems solid, and I just have a hard time imagining that modifying a persistent object isn't implemented as a transaction in modern technology. Ive built many products in my career (i'm working on consumer platform right now) and i just don't think a tech company can be successful without at a minimum crash consistency. Even open source has figured out that persistency needs to be crash consistent.

Is it true that hubit isn't designed to be crash consistent or is this just a myth (or is this a precaution because of some corner cases you worry about)?

Is it all homegrown or are you using something embedded to store state in (ala rocksdb, SQlite et al)?

Came across enough of these warnings on the forum today and it peaked my curiosity. I do not mean to be offensive, I'm merely a bit inquisitive as i find this surprising in todays technology landscape...

Being skeptical is one thing.... and I am... but...

I've read messages of people that had a corrupt DB, as evidenced by error messages and solutions (soft reset, restore older backup) and I feel it's too great a possibility for me to discard future warnings. I've also read that the H2 db has a reputation for corruption.

I've based my advice on info from more than a year ago... and I would really love to be able to have a better 2021 answer. Same really on the microUSB connector. It didn't look any more fragile when I was gazing at the PCB of my first C-5 than it did on an rPi (for one example.)

I think neither piece of advice does harm.. it's a Caution Sticker in my mind.

4 Likes

The 2019 answer is that it happened to me. When I was setting up Hubitat. I was pulling power to reboot several times a day (that was the Wink way to reboot).

1 Like

The recommendation originated with staff, so I'd trust it. :smiley: But that's not saying it's likely, just something that's easily avoided by doing things the "right" way. I don't know that Hubitat is particularly susceptible, but it's ultimately just a computer: it's running some variant of Linux as the OS, a Java (plus Groovy) runtime on top of that, and Hubitat--including the popular H2 database for Java, the likely candidate for corruption here--on top of that. I wouldn't shut down my computer by yanking power if it's at all avoidable, and I treat my hub the same way for the same reasons.

4 Likes

Iā€™d say the 2021 answer is still DB corruption can and does occur. I recently had to soft reset one of my hubs after it got into an unstable state. This was however without an unexpected power loss.

2 Likes

I had what I believe to be db corruption twice. Both right after loosing power due to a local storm.
My solution to each occurrence was to reload a backed up db. So I cannot verify cause and effect it certainly seems likely the power outage was the root cause.

Oh and by the way, My system is not that complex or overloaded with rules.

1 Like

I see, its H2 on top of a filesystem? Yeah i've heard that this isn't the most durable database. That would explain it.

Well, the database has to be stored somewhere, so while we don't know the implementation details, I'd imagine that it indeed lives some sort of (Linux) filesystem. The other option with H2, to my limited knowledge, is in-memory, which would be immune to this corruption but also not persist between reboots. :slight_smile:

2 Likes

ACID is in general a really nice set of properties. A bit more challenging to implement if you need really high performance but oh so nice when it comes to things like reasoning about what happens after a crash.

I read through the issue they highlight in the H2 docs.

...And yeah, the authors of H2 made an explicit tradeoff in order to gain performance. Unfortunately their workaround with replication isn't feasible on an embedded system like hubitat.
I guess i'll be more careful with my power cycling....

I respect that this issue is called out in the documentation:

This database does not guarantee that all committed transactions survive a power failure. Tests show that all databases sometimes lose transactions on power failure (for details, see below). Where losing transactions is not acceptable, a laptop or UPS (uninterruptible power supply) should be used. If durability is required for all possible cases of hardware failure, clustering should be used, such as the H2 clustering mode.

Their comment about "all databases" is a bit naive. Not all databases have durability problems, and volatile write caching fell out of favor with the 10K RPM disk drive and the introduction of TLC flash... :slight_smile: But yea, 15 years ago when they released it they where probably right.

BTW. i put this in the lounge because i was genuinly curios but i didn't want to treat it like support or something like that. Just some intellectual massaging.

I'm now satisfied.

Hub flushes the database every second. Shutdown your hubs properly! :wink:

If you routinely pull a USB flash drive out of your PC without selecting "eject" you may not see problems right away, but eventually it is guaranteed that flash drive will eventually get corrupted and will not be usable.

There is a reason there is a "normal" power down process

Its a fair point, if you use FAT32 which does not protect its metadata from power failure you may have issues with torn metadata in the filesystem itself. I think that most filesystems used for databases uses some form of logging.

If hubitat is Linux based i assume its running on top of ext3/4 or something like that which can tolerate power failures. FAT32 seems like a poor choice as the filesystem for the persistency layer (and i'm not saying that only for crash consistency reasons, i think its a poor choice for performance too).

All modern operating systems have moved to journaled filesystems for this very reason. The days of hour long fsck is well past us except for the fat32 devices that still exists in terms of usb sticks.

I just experienced a very strange form of database corruption: I could turn on lights, but turning them off gave me an error.

When I looked at the log there were many error events of functions failing. When I looked at the devices, simple light switch - there was the "ON" function, but no "OFF" function. I never saw that before. Soft reset did not fix it. I restored a database from 4 days ago. That fixed all the problems.
This is not good. The Hubitat Elevation computer/software should be more "hardened" against power failures. Sure, I added a UPS - but really??

What do you expect? Built-in RAID5 and Power Loss Data Protection hardware in <$150 consumer-grade appliance?

:rofl:

1 Like

That is not what is needed or would even help.
I am talking relatively simple software (hardening) solutions (ex: restore point transaction protection when updating database). Consequence, if in the unlikely event power is lost during a write/update operation on the database (all solid state of course) only the incomplete update is undone/lost upon recovery - as if it never happened.
Not that hard to do and zero extra hardware or processing power required to implement.

1 Like

Just take a backup and who cares if it gets corrupted.

2 Likes

I agree... no problem with that. However, it will fail in the weirdest ways and causes loss of confidence in the use and operation until one figures out that the database is corrupt. Therein lies the beast!

How about making any database upgrade take a backup automatically before executing the update?

Seems like a low cost insurance assuming databases isn't a super frequent operation?