[Solved] Z-Wave Failed - DEAD

Was just getting ready for bed when I realized several lighting automations were failing. I tried to control the devices manually without any luck. I tried numerous devices all through the house anod none of them worked. A reboot of the hub fixed it.

This is the second hub reboot of the night. Earlier I noticed that both of my Aeon minimotes were not controlling their automations, nor were they registering button presses. A reboot of the hub fixed that issue.

This is highly concerning to me.

I'm not going to bother opening any more support tickets as I've not received a single response to any of my open tickets in a week now.

Didn’t even make it 10 minutes and ZWave is now unresponsive again. Rebooting the hub or a third time. Will attempt a repair once it’s bsck up.

This system is way more fragile than SmartThings. It’s not a shot to anyone on the HE team. Just my own personal experience with the system.

1 Like

Can you list any custom apps or drivers you are using? When your z-wave devices stop responding can you report if the status of the z-wave radio is showing active and are your devices listed on the z-wave settings page.

Constantly rebooting the hub is probably adding to the problem. Let's see if we can see what is causing the symptoms.

Lastly which version of HE do you have single USB adapter or multiple (US/AU/UK)?

@bobbyD @bravenel @Cobra
I have not yet received my hub but have gone through the biggest majority of the threads here in the forum's and have not so many reports as I have seen in the last couple days. It seems this reports have started after the last update.

However this is my perception only from reading in the forum's and is not backed up by any analytics you might have on your end. So. Please take this at face value.

But I do understand @srwhite concerns as I'm now debating with myself if I should transition straight away from ST as soon I receive the router or if I should wait for the reported issues in the last couple of days to be investigated.

There are benefits and consequences of allowing users to run custom apps and drivers on the HE hub locally. Unfortunately poorly coded apps or drivers can cause the hub to become very sluggish or even crash. The questions I asked @srwhite are to try to pinpoint if it's likely a driver/app issue, bad or "ghost" z-wave node, or even something just spamming his z-wave network.

We're all here to help, just need more info.

Edit: IMHO the latest versions of the HE firmware have been VERY stable and any "bugs" are usually squashed by incredibly fast hotfix releases

1 Like

Lets hope thats the case and thats just rogue apps or drivers.

Regarding the FW i would agree with you. But other users like @april.brandt have also reported strange hub behaviours in the last couple of days.

That being said i do agree that the norm is that each fw release is positively astonishing. Real stable much more than the other players in the market.

If it's a bug in fw I'm sure that the team is hard at work fixing it and a fix might be very close. That's how good they are.
I also know if it is s rogue app/driver the users in question will let us know.

In conclusion the timing of the issues, the type of issues (devices dropping, Hub reboot, ghost alerts firing, etc) are currently making me anxious if I should risk the move into HE this week. Just that.

1 Like

Z-Wave is stone cold dead this morning. The Z-Wave repair didn't complete. It still showed as running, however repair events were not reported for the nearly hour I monitored.

Exactly Zero out of over 100 devices are working. It's not a driver issue, nor is it a ghost device issue. Those things will cause groups of devices to experience delays and/or random failures, but I've never seen either take down an entire network. Even with debug driver debug on, there's no messages being output that even give a hint of a problem.

This started happening a little after 10pm EDT last night, and cascaded into a complete failure. No Z-Wave messages in or out seem to be passing now. That half of the system is now completely dead.

EDIT: All Z-Wave devices are still showing up in Z-Wave radio settings.

I opened a ticket this morning. I have about 3 weeks left to return the hub to Amazon, although the family has told me this morning that things were much better with SmartThings and asked me to move back. I'm holding them off for a few more days, but am starting to rapidly lose hope over the stability of the platform.

I feel your pain, I have been having some weird issues as well since migrating everything over from Smartthings. I hated that platform and all the outages and being tied to the internet. But at the moment it was definitely more stable than my first week or so with HE. My family has also asked me to revert back to Smartthings but I have held them off in the hopes I can get the "bugs" worked out.

Some of the weird ones I have seen. Mode Manager stuck in a mode and not changing. HSM will not 100% change state based on modes. HSM will not always arm/disarm my DSC Alarm system through the Envisalink integration. Sometimes Rule Machine rules don't fire the first time. Had an issue with an override being stuck on my Motion Lighting App Rules that I had to engage support about.

While my HE has been very stable so far, one of the things that concerns me are the multiple posts on missing events and weird behavior.

As far as I know (?) we can't see CPU/memory/storage usage on the system, so I don't know how we would have any way of knowing how close we are to limits as we add more and more apps/devices... That makes me a little nervous.

Nor is there any way (that I know of) to restart in a 'safe mode' if you run out of resources or have an errant app...

Some patience would be nice here. It's understandable that there's concern. There's no need to get all jumpy. They're on it, but they're a small team. I've handed over info/logs/screenshots/ etc over to the HE team about the issues that I'm having. They'll look into it. I've even told them to do what they need to do in my hub to fix it. Offered to beta a fix if they need. Hell, I said I didn't care if they needed to start triggering lights poltergeist style in my house. They're on it. Some problems aren't just the dot of an I or a cross of a T. So, let's slow down on the "Oh I'm so scared of what might happen" statements. It's just kind of toxic. And as a community, there should be more supportive responses here. Let's not be like that.

7 Likes

Just had some Z-Wave/Hub weirdness for me this morning. I would love to see more advanced diagnostics like "ghost" nodes - I think it would help me troubleshoot things better. Is there some sort of resource usage report we can get?

My other concern is the larger the system grows the more complex it becomes and troubleshooting and management becomes a lot more difficult. At a certain point it becomes very hard/impossible to reset everything and start over so things need to be handled in an ongoing fashion.

I know there are very smart people working on this as well as in the community so things will continue to improve. Just never done when you want it to be! :grinning:

1 Like

I need to disagree with you here.
Customers showing their concerns is NOT toxic. It's feedback and a positive constructive one for that matter.
If customers wouldn't let the developers know of this feedback it could be inferred that the issues are not a big deal for the customers. The fact is that they are and will need to be proper investigated.
In fact the more instances of feedback that exist the merrier as it will allow HE to have more data points to analyse and reach a higher degree of certainty of the root cause.

Nobody is slashing at HE and everyone understands that they are a small team.
I haven't heard anyone asking for an ETA or anything close to it.

Customers showcasing the importance of concerns and describing in details their issues allows HE to also evaluate their risk assessment by weighting each item on their yo do list and prioritize things internally as a team.

Lets give them time. But let's not downgrade the issues by asking the community to be more restrained in sharing feedback.

5 Likes

The community is here to help and you can see this on almost most posts. You are an example as you described.
But others also have offered to help diagnose and troubleshooting as the need arises.

1 Like

Also I would add that a customer would not be complaining if they weren't interested / somewhat passionate about the platform to begin with. Maybe there are better ways to express things but usually it seems everyone wants this to succeed - there is so much potential here.

I do think when posting a complaint one should exercise some caution/humility - a lot of times (for me anyway) it's been user errors/lack of understanding rather than issues with HE. There is a lot going on under the hood. Would still like to see more advanced diagnostic reporting though.

2 Likes

We also should be aware of cultural background. Some cultures are more direct others are more easily offended if more finess is not use in language, others love what I call foreplay of language.

We are a diverse, multicultural, global community so some consideration needs to occur as well. We should be looking for what is being said (facts, descriptive elements, etc) rather how it is said.

3 Likes

I experienced the same and reported it. It seems to have settled down, but still misses once in a while.

I think there is a difference in how HE initializes Zigbee devices and manages their sleep behavior. Could be the long/short polling intervals, but who knows. Just about every motion and contact sensor occasionally fails to report an event. Not that it's the devices fault. The LED's always flash indicating activity, but the hub never gets the message.

I completely agree, a read-only view of processes and resource utilization would be extremely helpful. There are those of us with extremely deep technical backgrounds that are capable of doing advanced diagnostics before involving support. It's just information, why hide it? Two thumbs up for that request!

I trust that wasn't directed at me. Speaking only for myself, I've exhibited an extremely high level of patience. How many customers would wipe and reconnect hundreds of devices to try to solve a problem that support had not even acknowledged? The reality is that most would throw the hub back in the box and return it.

I think the feedback myself and others have provided to the HE team should be useful, and welcome by their dev team. The platform has bugs. There's no need for you to suppress that knowledge, nor should the developers to hide from it. There isn't a coder alive that has ever written a bug free application, and there probably never will be. That kind of feed back, even if it may be hard to swallow, is the only way an application or platform can ever truly mature.

I have hundreds of hours in doing just that at this point. Having a large network brings its own issues and a certain implied commitment to increased troubleshooting time and effort. But there comes a point where one needs to reevaluate the value proposition that a particular platform brings.

Completely agree. The community here has been great and very supportive. There's no need to suppress any feedback, good, bad, or indifferent.

That too is correct. When I make a decision or agree to do something, I do it with 100% effort and dedication. The move from SmartThings to Hubitat is no different. There however remains a breaking point in every endeavor, where the value proposition of a project diminishes to the point where a project is no longer feasible. For me, that point will be reached when the end of return period of the hub draws near. At that point it will have been about 7 weeks of trying to get it running stable, a more than reasonable attempt IMO. In my attempts to get this working stable, I've purchased 3 xBee's as well as several motion and contact sensors, to track down issues and replace suspect devices. None of those efforts have paid off at all.

Anyhow, I received a confirmation that support was looking into the issue. I'm encouraged that the devices are still showing in the radio device list. I trust (assume) that means they're still in the sticks Z-Wave routing table.

1 Like

Are you using custom drivers or built-in drivers. Which built-in drivers are are involved in the devices that do not work properly?

Also, are any of your Z-Wave devices joined securely? This is known to drag down your Z-Wave mesh if used for devices other than locks. Every command involves 3 times as much traffic. Spread that around, and you'd have issues.

Iris Z-Wave repeaters are known to malfunction in a Z-Wave mesh. If you have any of these, they should be removed as Z-Wave devices.

These are the sort of issues we've seen in other customers' systems with problems like the ones you are describing: Force-removed Z-Wave devices, secure Z-Wave joining for non-locks, Iris repeaters, and custom drivers.

1 Like

Can you see this on the device page? It occurred to me that one of my aeotec's has a different colored led. So I want to check this out in my system. The multi allows for secure pairing, but I wouldn't intentionally pair it that way.

Yes, under "Data" field. See this link for more details:

None of my Z-Wave devices are working. I've got around 40 GE Dimmers/Switches/Fan Controls all using stock drivers. I have the Z-Wave repeater side of the Iris plugs using a custom driver I wrote for SmartThings and ported to HE. All that does is a periodic health check to spot any that fall offline.

Those include 5 locks, a Zooz 4-in-1 sensor, an Aeon 6-in-1 sensor, and an Aeon Energy Meter G5 securely joined.

I disagree with that assertion. Those that have the upgraded firmware 0x20082010 are rock solid as repeaters. The issue that Hubitat and Smartthings has, is that during pairing, the hub fails to set Association Group #1 to the ID of the hub. This causes the Z-Wave module to believe it's fallen off the mesh. Once Group 1 is set, you can pull reports, MSR, etc. from the plug, and they remain connected and stable.

I won't rule out force removed devices as being an issue. Hubitat is unable to pair my Schlage BE469 deadbolts. This led to a few "phantom" devices that had to be removed during my numerous attempts. But that was a while ago. It looks like the Z-Wave stack failed around 2:37am last night. Looking at logs, that appears to be the last message received from any devices. That was well after everyone was in bed.