Chaos Following Power Cut - Observations and Unexpected Behaviour

So we had a power cut (rather I turned off the power, to test the effects of an unexpected power cut on some unrelated equipment, that I'm installing for a customer next week - what a complete tool I am as I didn't shut down Hubitat first)

Nothing was working - corrupt database. So I did a soft reset and restored last nights backup. Devices started working again but with some issues. I've managed to guess the first couple of issues..

1 - Doors that are open showing closed, curtains that are open showing closed. My guess is that this is because those doors and curtains were closed at the time of the backup, those states were captured and when the database was restored, those states were restored/assumed until I refreshed the states by closing and opening them again. Would that be the case?

2 - Similar to above - a motion lighting rule was bringing lights on during the day. I have a boolean that is false between sunset and sunrise or if the curtains are closed, true otherwise. The boolean should've been true but was false because the open curtains were showing closed. Closing/opening them flipped the boolean and restored normal operation.

3 - This one I'm not sure of - expected behaviour or bug? Two separate RM rule for some lights have a required expression of between 2 times:

The first (Hall and Stairs) is correctly showing 'required expression false'. This is only because at the point I took the pic, I'd already opened the rule and clicked Done which fixed it. The second (Landing and Stairs) is not showing 'required expression false' in the RM child list, but is when I open the rule as below....

....it is showing false. At that point the RM rule was ignoring that required expression and bringing the lights on. Clicking Done fixed it.

So at the time the Hub rebooted when the database was restored, should it not have booted with those time related required expressions false?

Yes, this will not be "corrected" until a new event comes in from the sensor. If the device wakes for these commands, a refresh() from Hubitat would also work (but note that it would be treated as if the event happened just then, which may unexpectedly trigger apps; I personally just let this slide and don't worry about missed events, though a few users seem to prefer something else).

By "similar to the above," I assume you mean that whatever toggles this boolean was missed when the hub was shut down? If so, yes, in general, scheduled jobs that are missed when the hub is off will just be missed. Scheduled jobs that are pending but still at a future time when the hub is brought back up will still execute as scheduled.

This might also be similar to the above--was the hub off when this time change happened? The required expression creates a scheduled event for the beginning or end time, and the same as other scheduled jobs from above applies here. This isn't constantly checked, presumably because it would be a waste of resources to do so constantly when the hub is functioning normally.

Hitting "Done" re-initializes the rule, including re-evaluating the required expression, so that is probably the explanation for the difference between the two.

Some of this can be worked around (or all of this if the automations are created slightly differently and you care enough to make these changes, I suppose). A hub reboot generates a systemStart event that you could subscribe to in a rule (or custom app), and you can do whatever you want there. Of course, a better idea is to keep the hub on a UPS so both this and your database corruption are far less likely to happen. :slight_smile:

4 Likes

In addition to @bertabcd1234's thorough response, a couple of things that trip me up from time to time. When you restore from a backup the mode (if you use modes) is in whatever mode the hub was in during the backup, and will not change on its own until the next "event." Since automated backups are usually at night I find upon restore that my hub is in NIGHT mode. Also when you do a soft reset it's often easier to make a backup first, save it locally, do the soft reset, and then restore. That way the backup is fresher than the previous night. The actual process of doing the backup and restore is what (usually) gets rid of corruption in the database - it's not necessarily the restore from an earlier backup that was done before the power shut down.

I have also noticed when I have a power failure (not my hub, since that is on a UPS) that my meshed networks sometimes take a while to heal. Even though a lot of the end points may be battery operated, all those repeaters I put in to stabilize my mesh are not! Even a 30 second power outage while I wait for my standby power to kick in sometimes wreaks havoc.

And yeah a UPS is your friend!

3 Likes

This has got me more than once. Now I have an init routine that initialized a bunch of stuff after booting.

I also learned after a 30 minute outage the other day (hubs are on a UPS but I shut everything down b/c I expected the outage to be longer) that if your hubs come up before your router, it’s a crap shoot what IP they assume. Lol. I just assume they would keep on trying. I thought @gopher.ny posted something about that. Anyway no. One out of three got the proper ip and was online properly. The second one thought it was on a 169… ip. Weird. And the third one seemed to have the right ip but did not connect properly via hub mesh. Maybe b/c I am using UDP. Dunno. Just weird.

The APIPA address (169) is the behavior I would expect. But I bet if any IP device powers up before your router (or probably more specifically your DHCP server) the results may be... unpredictable. I actually power my HE with a PoE splitter from a switch - I know for sure the switch will be up first.

1 Like

Yeah I just wasn’t thinking and a couple hours later when the mode change should have trickle through the mesh I realized my house wasn’t as smart as it should be. User error. I saw the blue light on two of the hubs as soon as I turned on the ups and I though, ah they’ll be fine. Lol

The one that came up properly, now that I think of it, is because it’s on a second ups with my access point upstairs and of course I waited to power that one on.

No the required expression is between sunset -15 and sunrise +30. The required expression was true at the time of the automated backup, however the power cut was at 2 PM (ish), and the backup was also carried out well before the required expression was due to become true again which is why I queried it.

I've another rule I forgot about that is in a similar incorrect state currently. I've left it untouched for now. So the power cut and restore happened while the required expression was false and it should have switched to true at 20:12 but has remained false. I guess that following the restore, the required expression wasn't scheduled. It's now 22:28 here in the UK:

There are pending scheduled jobs for the off events, but the expression is showing false and the lights never came on:

I understand the reasons for the other issues I had following the restore, but expected that as all of that happened before the required expression time, that those rules would be unaffected.