[BETA] Hub Failover - AKA an opportunity to protect or destroy your environment

thebearmay · October 26, 2022, 11:48pm

** The solution described below is not something that most HE owners will need or that I would recommend for the average or casual HE user, but for those who are comfortable working in complex technologies and understand the risks, this may provide a method to add a higher level of availability (as opposed to High Availability) to your environment…or it could take it completely offline. **

Hubitat® Elevation Higher Availability (HEha)

For homes where automation of the environment has become the expected condition it is highly disruptive when a key component fails. In the Hubitat® environment, the loss of the hub is significant, and while the Hub Protection service provides for a replacement hub and permits restoration of the hub’s data and radios there is a delay before the restoration can occur due to shipping. In the ideal world, a secondary hub would be on hot standby and would automatically assume control of the environment if it detected a failure (loss of communication in the primary hub). This details one solution.

Components

From Hubitat

Hub Protection Cloud Backups
Spare C8 Hub

Community Supplied Apps

Hub Failover Manager Application
Hub File Manager Sync (Optional)
a. for application data files, etc.
b. Requires the Local File Methods library
Hub Variable Sync (Optional)
a. Only needed if using variables that affect real time processing

Setup and Flow

Initial

Power up and register the spare hub.
Install and configure the Failover Manager app (requires enabling OAuth) and, if desired, the optional File Manager Sync and Variable Sync apps on the production hub.
Create a Cloud Backup of the production hub
Turn off both radios on the production hub
Restore the Cloud Backup to the spare hub selecting both radios and the local file system.
Check the configuration of the Failover Manager app, and then
a. Press the button to disable all apps on the spare hub
b. Toggle the Turn off all radios and start monitoring heartbeat switch to the on position.
Turn on both radios on the production hub.

Ongoing Maintenance

Periodically take a backup from production and restore it to the spare to capture any rule or application changes.
After adding a new device to production do a Cloud Backup of production and restore it to the spare
Re-initialize the Hub Failover app by doing 6(a) & 6(b) above.

In Operation

• The Failover App on the spare hub does a periodic (configurable) ping of its instance on the production hub.
• If the production hub fails to respond X (configurable) times:
• Failover app sends shutdown command to production (in case it is up and can’t respond)
• Failover turns on the Zigbee and Zwave radio on the spare hub
• Failover turns on all disabled apps

BorrisTheCat · October 27, 2022, 6:14am

How does this work, ZigBee can only be connected to hub at a time?

tmcdonald · October 27, 2022, 8:46am

Very Interesting. Do not understand the zigbee and two hubs though.
I would be inclined to set this up on at least 7 of mine. I already have the backup hubs and I already have backups restored on them. I am ready to give it a go. This would increase my use of hub protect across many more hubs.

rocketwiz · October 27, 2022, 10:06am

If we don't have any z-wave devices (like me) is there any reason why a hub protect cloud backup would be required instead of a standard backup?

Also what a strange coincidence - I've been researching running a high availability HA setup over the last week!

thebearmay · October 27, 2022, 11:45am

If the "production" hub doesn't have it's radios on then, unlike ZWave you can re-pair the zigbee device with a second "failover" hub. Once you do that though it is critical to only have one paired radio on or you risk disruption of your mesh. That is the reason for the radio off/on on/off dance above.

thebearmay · October 27, 2022, 11:46am

The Hub Protect backup is used with this solution because of the ZWave radio database (it also cuts out the download to a third device, upload to the hub exercise); if you don't have any ZWave then the regular backup should work.

bobbyD · October 27, 2022, 11:53am

Hub Protect is a great tool for contingency planning when a hub suddenly dies. Not only does it provide a free replacement hub, but also you will have a cloud backup ready to restore onto the new hub, to recover apps, settings and devices, if a local backup was not previously saved or not readily available.

brad5 · October 27, 2022, 12:23pm

Hmmm this is intriguing and quite ingenious. Thanks for your work on this, @thebearmay. I assume it would work just fine with Hue and Lutron devices paired to their respective hubs? I am not sure I want to go through the hassle of dual-pairing all my zigbee devices but having Lutron and Hue come back up would at least not leave me literally in the dark!

rocketwiz · October 27, 2022, 12:35pm

That's why I have 2 hubs No need to wait for it to arrive from the US.

rlithgow1 · October 27, 2022, 12:55pm

Yes because the integration is in the local database and simply points to those devices (they should be on ip reservation or static ip's for reliability)

rlithgow1 · October 27, 2022, 1:00pm

I will also note something when I recently did my own hub restore to a new hub. As I removed a zigbee device from the old hub, I could then hit "Start Zigbee Pairing" on my new hub and it would pick up the device and slot it into the new database properly. This worked on almost all of my devices. I had maybe 3 or 4 I had to factory reset and re pair. This was awesome because I was not looking forward to resetting 36 window contact sensors. So I assume there is some sort of command going back to the device to go back to pairing mode when you click remove on Hubitat.

thebearmay · October 27, 2022, 1:05pm

For this to work correctly though, you don't want to remove devices from the production hub, just want to add them to the failover hub.

thebearmay · October 27, 2022, 1:08pm

Another thing to consider, and I may add this up above too, is that the failover hub should be in close proximity to the production hub if you want to minimize the number of route changes when the failover occurs.

jlv · October 27, 2022, 2:14pm

When I did a manual Hub-Protect failover to my 2nd hub, I had to reset many of my Zigbee devices to get them to pair to the 2nd hub. I'm still confused as to why they would still talk to my 1st hub had I turned it's radio back on (after turning off the radio in the 2nd hub).

One other thing I had during the failover: two apps with using cloud endpoints didn't work until I regenerated those endpoints. I don't know why that happened.

thebearmay · October 27, 2022, 2:33pm

The restore brings back the zigbee database so the devices should slot in after pairing, as far as why both could attempt to control think of the pairing as an exchange of keys. As long as I don't tell the original hub that someone else has a key it will go along and use the one it has, and the device is seeing the same keys regardless of which device it is talking to because of the restore.

Not sure how the hub access code hash is calculated, but if it is using information from the physical layer it is possible the code may need to be regen'ed and updated in the endpoint.

kkossev · October 27, 2022, 2:34pm

Can you add a link in the first post to the thebearmay.localFileMethods library?

thebearmay · October 27, 2022, 2:39pm

Good catch, I'll put in the first post also, but:

https://raw.githubusercontent.com/thebearmay/hubitat/main/libraries/localFileMethods.groovy

Tony · October 27, 2022, 2:50pm

Don't the network keys change periodically (under control of the trust center-- the hub, in this case) on a running network? Only when a device is in join mode willl it receive a new (current) unencrypted network key.

Not sure if this is implemented in HE, but as I understand it , per the spec the trust center can change the network key 'periodically or as required' and distributed it across the network (encrypted with the old network key). Devices continue to retain the old network key for a short time thereafter (allowing time for all devices to switch to the new network key). They also use internal frame counters which increment on every message (to prevent replay attacks); these frame counters get reset to zero when a new network key is received.

I've played with this before (trying to get a previously joined Zigbee device to 'slot in' transparently on a different hub) and it didn't work... I thought I convinced myself why (as per above) but maybe I'm not understanding what is different about this approach.

kkossev · October 27, 2022, 2:51pm

Thank you @thebearmay !
This is something I was looking for quite a long... A lot of us have already invested so much in smart gadgets, so achieving a Hubitat Elevation Higher Availability at the cost of one spare hub (+Hub Protect) is something that is definitely an excellent solution.

I already know what will be my Christmas gift this year! : )

kkossev · October 27, 2022, 2:55pm

HE does not change periodically or on request the Zigbee network encryption key. It is the well-known default key.