Database size, corruption, and cloud backup concerns

I've had the Habitat for about going on a year maybe a little more or less. Love the product think it works great. However, I do have some concerns overall. The robustness and resilience of the internal parts that make up the Habitat such as the database I have concerns with. Just to know, I have two hubs a C5 and a C7 and a side note a SmartThings for oddball items.

I know there was a database sizing issue in the past but there still seems to be database issues. My C5 one day the database was corrupted for no apparent reason running for weeks or even months without an issue. I had to do a soft reset on that to get it back working. Then it was the database sizing where again I did a reset to resolve or reduce the size of the database and then there was the fix in the firmware for it.

Today, my C7 totally went under. Database started to grow yesterday, or the day before, for no reason. Database size was approximately 435 MB. The settings were set pretty low three or five in both sections for the drivers, but the database was still growing. All of a sudden I received the message from The hub database corruption do a soft reset and restore. So, I performed the tasks and when I attempt to restore from the cloud I was unable to. A message briefly flashed up decryption failed, restore failed.
Luckily, I had locally backed up the database in previous days; I also do an offline backup to a storage device, this is besides the cloud backup; which, should be break the glass solution. However, this failed big.

Here are my concerns:

  1. Need a better way to manage the database size and for the performance
  2. A more resilient way for the database to recover.
  3. The cloud backups should be fairly bulletproof and no/little touch.

Unfortunately, I do not find this to be the case for the Habitat. Like I said I love the device, I'm providing the feedback because I do and why I'm not still on ST or any other vendor/ solution.

I never had this issue on ST, but I'm sure things like this on that platform happen to. Although, I've had ST for a few years and never saw this issue. I do understand it's a different platform and it's cloud-based but the resiliency from the core components seem to be better. In addition, there needs to be better recommendations and guidance around the database since this is an integral part of the habitat solution.

The one big thing I see for Habitat has Achilles heel is the database in my experience using the product for about a year.

Sorry for the very long message.

1 Like

Sorry to hear you’re having database issues, something I’ve never experienced (just jinxed myself, right ?)

Have you reached out to “support” about this or tried disabling some of the Devices/Apps with checkered behavior ? (Xiaomi, chromecast etc )

It’d certainly help to know more about your setup.
Are you only Zigbee ?
A mix of everything Zigbee, zwave & WiFi ? Etc etc.

Set your device state history and events to 11.
Do a fresh backup
Soft reset your hub
Reload your database
After database loads, reboot your hub.

This has taken care of my DB issues.

Setting your history and events lower than 11 can create problems

You'll also want to read through this as well to gain a better understanding of the database within hubitat

FYI: this was true for a few versions before 2.2.8, where the cleanup would happen as the events came in instead of during the hourly maintenance as it did with values >= 11, but as of now, it will (or should...) happen hourly at most, regardless of this value.

2 Likes

Thank you all for replying.
@njanda I haven't contacted support because they just step me through t the process of doing a soft reset. Noting is captured why this has happened.
I don't believe I have any of those. C5 runs all of the rules, internet applications and IP related (NTP, Google Alex ETC.) I do use mesh between the C5 and the c7. Both hubs have Hub Info installed (c5 all is good on the c7 show high events don't know why). C7 had 5-6 z-wave and NOT Zigbee. I have maybe, 5 or 6 ZigBee device on the C5. So, c5 has a lot going on, but the C7 much less and this still had issues. Both, in my option are underutilized. So, this shouldn't be an issue.

@SigFan86 I agree with @bertabcd1234

I have that numbers post fix down to 5 and 5 on one hub and 3 and 3 on the other hub. I just sent them both back to 11 and 11. I am looking for more logging.

This all goes back to my original statement above.
Devices shouldn't be corrupting DB either.
Finally, Cloud Backup needs be perfect every time.

Here is another example..

Here we go today. I shut down my hubs both C5 and C7. C5 comes back up and it went from 35MB to 797MB. No reason why. I did a DB clean up not effect. Rebooted not effect. I'm going to way see after 24 hours does this clear up. There is a problem with DB stability. I'm on 2.2.8.156. There needs to be a better way for this. :rage:
The fact that there isn't a RC for this, it will never be fixed.

Sounds like the next step is to download a backup, do a soft reset, and then restore from the backup.

Yes, I just did that. However, this shouldn't be a constant thing that you're doing this every several months. If the database is that sensitive then there needs to be better tools to either manage it, monitor it, and diagnosis. Once again, I have no idea what caused it or how to fix it other than restore from backup. This shouldn't be that sensitive.

@gopher.ny has indicated that 2.2.9 will have some safeguards against this happening. FWIW, it happened to my hubs just once in 2.5 years.

It may have no relationship, but the only time I had this issue I was also fighting a ghost node…

2 Likes

@aaiyar Thanks you for the information. What am I doing to have bad luck. :laughing:
Any chance of an ETA 2.2.9? Q4 of this year? Don't need an exact date just an estimate.

Updates are about every other month if you look at release notes. Release Notes - Hubitat

Depending upon how you look at the dates, the next update should be fairly soon. I haven't seen any beta testing mentioned yet, so I would say we are going to be closer to mid-October at the earliest.

The other clue is usually when the Hubitat team goes quiet on the forums. That is often just before the Beta is about to be released. They are all hunkered down writing code and testing things.

So yea, a good chance you should see something "soon", whatever that means.

3 Likes

Odd issue. After restoring my C5. I was working on my hardwired backup system. The sensor was broken in that door. So, I was opening and closing that door that has a z-wave sensor too. Strange thing is that hub went out to lunch. I started noticing certain tasks /rules weren't running. So, I attempted to log into my C5, and I got a 500 error. The Hub was completely unresponsive, but all the lights look good. I do have a few rules that run on that door z-wave sensor. There are RM and webcore rules that I use. Is it possible that a rule being ran multiple times consecutively can cause an issue? I've only seen this one other time, when the hub wasn’t rebooted in a month or two. Has anyone else seen this?

Thank you all for your help as usual.

I signed up for cloud backup not long ago and now I have the issue too.


Like the message says, try a reboot of the hub to see if that corrects it first (with that error you may need to use port 8081 - http://<yourHubIP>:8081). If you need to go the soft reset route, try to get a downloaded copy of the database and use it to restore from.

It threw the error at first so I thought it had failed but I let it sit there for a moment then it came back up eventually - Any ideas what might have caused it? I've been through my logs and disabled some logging from from a driver which appeared to have it on by default for all devices:

Believe the current working theory is that under certain conditions (not sure what these are) a set ot write attempts to the DB collide and cause the database to stop reusing space freed up by deletes (reads, writes, deletes and updates happen continuously). Once the DB stops reusing space it begins to grow at a rapid rate until the hub is rebooted - rebooting performs a database cleanup.

Thank you for the information.

I had performed a reboot on the hub yesterday morning before going to work then when I arrived home it had already stopped working completely.

I do remember seeing the warning message last week as well though at which point I rebooted it which doesn't appear to have helped.

I'll keep monitoring it over the next few days.

I hard wired some of my light switches so nobody else here could use them while I was at work & HE was down. What a nightmare! I'll have to expose some of my cloud devices to my Google Home app so that people can at least use voice commands to control them. I might also install their native app (LIFX, HUE) on the wall mounted tablet I have here running my dashboards.