Catastrophic ZWave failure over night 41/47 devices failed

Woke up this morning to find 41 of my 47 zwave devices are no longer responding and are ‘lost’

8 zwave extenders deployed around the house.

All the radiator valves, thermostats, temp sensors, motion sensors are unresponsive.

Looking in the logs it appears Hubitat dumped most of the zwave devices very early this morning.

Hubitat had also lost connection with the other Hubitat, both of which are connected to the same network switch.

Rebooted both.

Hubitat snow talking to each other.

Both devices are on an UPS, and had been rebooted the week before.

Will continue to try zwave repairs, but if Hubitat continues to fail I will have to revert radiators, hot water and boiler control to manual, as areas of the house are currently too cold to occupy.

I had hoped by using a Hubitat c7 just to control heating, hot water, boiler on zwave I would not be over taxing its capabilities.

I am moving to a new house in a month or so, in light of this wide failure will have to investigate an alternate controller.

Sounds like one for support, @bobbyD or others in the HE team can hopefully sort this out for you.

In terms of your move, as much as this is a concerning situation to find yourself in, it does feel like it is uncommon, at least from what I have personally seen on this forum, but I am not someone that uses Z-Wave myself. I'd urge you to reserve judgement, not just because I, like many others here are fans of HE, but only because it is easy to see one set of symptoms and assume one root cause, but there could be something else at play that is causing your issue, and would not want to see you adjust your use of HE as a result of the failing of some other system or service. That sounds a bit too wordy and indulgent now that I read it again.... Just wait to find out the root cause, that's the crux of what I am suggesting, don't leap to any conclusions just yet....

1 Like

A power down, un plug, wait 5 mins plug back in and voila All zwave devices are suddenly found again.

Odd that the zwave controller died keeping 6 zwave devices running, and a reboot did not fix it, a full unpowered is required.

Troubling

I'd still give the support staff and Community members an opportunity to respond...

2 Likes

The power down would have reset the zwave radio, so that is where to start looking for answers, but as @sburke781 said it would be best to bring in support to look at the engineering logs on this one. I’d go ahead and send them an email with the hub model and MAC and ask them to take a look.

3 Likes

Most certainly well.

If they can share a way that I can detect when the zwave controller has failed that I can detect on my other c7 hub, then I can force a shutdown of the zwave c7 hub, use a zigbee socket, power the hub off for 5 mins and then power it back up.

Until then will just get the zwave hub powered down and back up during the night, every night.

I can do that until I can find a reliable way of either detecting when the zwave controller is failing and force a power off reboot, or an alternate zwave controller.

Not going to get into it until after I move to the house in about a month, in the meantime just hard, power off, rebooting the zwave Hubitat should help stabilise it some.

So good outcome, got things working again, have a workaround to help keep it stable, and a way forward for the new house.

I know I am pushing the boundaries of Hubitat and pushing it further that I should but getting it to manage a multi zone heating system, but when I look at COTS multi zone heating systems their forums are loaded with the same kind of issues I am experiencing ,the only difference is their systems are locked down and all they can do is pray the developers resolve it.

With this homebrew Hubitat solution at least I can develop workarounds to issues, like a TRV reporting its valve is open when it is not and vice versa, an issue I run into daily, and resolved by ‘flexing”, ie to open valve fully send a valve level of 99, then a valve level of 50 then a valve level of 99.

You can’t do that kind of workaround with commercial off the shelf products.

1 Like

If the hub is generating a zwaveCrashed event you could use RM (or a quick custom app) to kick off your process.

1 Like

Is there a custom app that can detect that @thebearmay ? Or that may in the future? :wink:

I may have some code laying around that could be adapted fairly quickly… :sunglasses:

1 Like

Oh... that's not what I had in mind at all....:grin:

2 Likes

Thinking that the flow would go something like:

  • capture crash event
  • send push notification
  • call external hub endpoint (RM or ??) - which should have a 3-5 minute delay built in - to cut / restore power
  • send shutdown command to hub

Or just flick a switch in the first instance, others can deal with it how they wish, or a separate app (that you could develop if you want) can deal with the switch turning on however it chooses.

2 Likes

That’s even easier, as the switch could be the trigger to both the hub shutdown and be monitored by the external hub to initiate its actions.

Edit: Code to set switch assuming it’s a location event:

Code
/*

 */

static String version()	{  return '0.0.0'  }

definition (
	name: 			"Zwave Crashed Switch", 
	namespace: 		"thebearmay", 
	author: 		"Jean P. May, Jr.",
	description: 	"Logic Check .",
	category: 		"Utility",
	importUrl: "https://raw.githubusercontent.com/thebearmay/hubitat/main/apps/xxxxx.groovy",
	oauth: 			false,
    iconUrl:        "",
    iconX2Url:      ""
) 

preferences {
   page name: "mainPage"
}

def installed() {
//	log.trace "installed()"
    state?.isInstalled = true
    initialize()
}

def updated(){
//	log.trace "updated()"
    if(!state?.isInstalled) { state?.isInstalled = true }
	if(debugEnable) runIn(1800,logsOff)
}

def initialize(){
}

void logsOff(){
     app.updateSetting("debugEnable",[value:"false",type:"bool"])
}

def mainPage(){
    dynamicPage (name: "mainPage", title: "", install: true, uninstall: true) {
      	if (app.getInstallationState() == 'COMPLETE') {   
	    	section("Main")
		    {
                input "qryDevice", "capability.switch", title: "Switch to use:", multiple: false, required: true, submitOnChange: true
                if (qryDevice != null) {
                    unsubscribe()
                    subscribe(location,"zwaveCrashed", "setSwitch")
                }
		    }

	    } else {
		    section("") {
			    paragraph title: "Click Done", "Please click Done to install app before continuing"
		    }
	    }
    }
}

def setSwitch(evt){
    qryDevice.on()
}

def appButtonHandler(btn) {
    switch(btn) {
          default: 
              log.error "Undefined button $btn pushed"
              break
      }
}


def intialize() {

}

1 Like

How much of a coincidence is it that we have at least 3 Z-wave crashes within a few days? Read here: Z-Wave Lock Stopped Talking to Hub Suddenly



There are known bugs in the Silicon Labs Z-wave firmware/SDK that has been plaguing all 700 series controllers (not just Hubitat.) SiLabs recently (last few days) released a new version that is rumored to address many of these issues. Hopefully a fix will be forthcoming, but probably not in the next Hubitat release as SiLabs just released the new version. The Hubitat team will want to do a lot testing, I am sure.

6 Likes

It quite likely is a coincidence. Z-wave being a local wireless protocol and all.

4 Likes

I’ve visited this community every single day for ~3 years. Zwave “crashes” are extremely commonplace. They almost invariably result from poorly constructed messages, chatty devices, or ghost/stranded nodes.

Having 3 such incidents within a few days is not coincidental at all. As @ogiewon pointed out earlier, one underlying issue for 700-series controllers is buggy firmware from SiLabs.

3 Likes

Indeed, also seen my fair share of issues. Regarding the buggy FW, that is exactly what I was thinking, that perhaps there was a built-in timed bug causing the issues. Perhaps a bit far fetched :wink:

In any case, hope the new Silabs FW will finally solve the issues for good.

3 Likes

Unfortunately the zwave controller crashed again last night.

The idea of rebooting the Hubitat each night failed miserably because despite an orderly shutdown, one minute power off, each time the Hubitat came back up it ‘Lost’ Hubitat Mesh, losing the Temp Sensors running on the Zigbee Hubitat.

I am about to move to a new home, will use another home automation hub to control zwave and keep Hubitat for zigbee and put Node Red over the Hubitat and the new zwave controller.