Elevation C7: Possible faulty z-wave radio?

What you're describing is a little different, though I have seen what you're describing as well. When my hub is working, all devices are quick and responsive, even when a device is listed as failed.

The behavior I'm seeing appears to be directly related to the ability of the hub to send commands (either through the dashboard, automations, or Alexa integration); in my experience, sensors continue to relay information into the hub without issue.

I have the same issue as you, where I would try to turn on a light (switch next to the hub) and it wont respond. I changed some settings on couple of Zoos multi-sensors and relays and it didn't sync to the device (that was yesterday and it's still says pending) as well as Inovelli switches.

It's like the whole network freezes and after a while comes back to life or comes back partially. For example, temperature would update on the sensor but motion won't. I've used HASS as well as OpenHab and HomeSeer and never had this kind of issues. My sprinklers were ruining for about 4 hours (I'm using Zooz Z-Wave relays for that) until I went outside and noticed it! ZeegBee devices are working without any issue.

I think there is a firmware issue. I've seen some logs that said sometin like network not responsive or something of that nature.

It's just unusable at this moment. I gues I'll just go back to my C5.

Almost exactly the same here.
I have only 35 devices, all Z-Wave plus, added with mixed levels of security, mostly mains-powered.
According to the "best practice" I took my sweet time while including them.
Experiencing only some lags before, things were getting really severe after adding my TRVs, which are all FLiRS.

Simple motion trigger lights on-automations didn't work anymore or with a delay of a minute or more.
While in this state, also voice commands are being ignored.
At the latest when one of my TRVs is not responding anymore, it's time for a safe power-off, need this almost once a day.

Did some testing using the Hub Watchdog-App with the following results:

watchdog
As you can see, the response time for my tested device is up to 45s.
In this case an Aeotec SmartSwitch7 included with S2 security, within direct range to HE.
The 4 "faster" results above are Oomi bulbs without any security.

Another finding I'd like to share is that i find always the same mains-powered S2-enabled devices stated as "not responding" or more rarely as "failed".
What they have in common is the fact, that each of them is routed directly to HE.
Intereresting, 4 of them are Aeotec SmartSwitch 7, and the only one I had been not able to catch with such issues is the one routing through another device.

Anything in common?

1 Like

Exactly the same. What's interesting is that all of my switches are Inovelli and some were successfully included with S2 (those are the ones that's having communication issues and they are next to the hub) and others are without S2 (none for auth) and I tried multiple times to exclude and re-include them without any luck. I will try to nuke the hub and start over, if that doesn't work, I'll just go back to my C5 hub that worked flawlessly. Maybe @bravenel can point us in the right direction?

1 Like

I've been chatting with Bobby from support (who has been awesome, btw) about this issue. They have me disabling non-essential application integrations, but believe that I'm overloading the z-wave radio at times, which is what is causing this. The z-wave logs that I have don't give me a true sense for just how chatty my 90 devices are, but anecdotally, failures have happened when my family has been away from the house, as well as at night when everyone is asleep -- so whatever load may be causing this doesn't appear to be driven by basic use.

If you guys are having issues at ~35 devices and are experiencing the same failure rate, I'm inclined to believe network size has less to do with it.

The S2 angle is very very interesting, but I'll be the first to admit I don't understand its underpinnings and how it would affect the radio like this. Thanks @Mr.Olsen for the reference to the Hub Watchdog-App; I'll see if there is anything interesting there for my hub.

Bobby mentioned that they appear to understand the issue enough to commit that this will be fixed in their next release. Unfortunately, I do not have a C5 to fall back on, and I DEFINITELY do not want to go back to Vera.

Have you seen this thread?

I hear you. I only had it for like 2 days while back and switched to HomeAsistant.

As far as overloading/storming Z-Wave network, I personally disable all of the power usage reports and stuff like that, unless I'm actually using it for an automation as well as I only have like 3 rules running at the moment as i didn't want to invest my time until I figure out what the hell is going on with Z-Wave network.

I hadn't seen that post. Thanks for the link, I think! It doesn't solve my problem, but I feel a little bit sick in my stomach after having read it. I don't have the equipment, energy or motivation to sniff the z-wave traffic from my devices that I've moved over, nor do I have direct evidence of this being caused by a z-wave storm, but the symptoms at least partially align. All of my current devices worked when paired with the Vera before I moved over, so some evidence that this is a hubitat/C7 specific bug appear to have some credibility. It makes the support line "you just have a mesh network problem" that much more difficult to swallow, given that the only real difference has been the hub itself (over simplification).

I'm very much looking forward to the new release that addresses this fragility. I hope it doesn't take long to get here, as the only way forward I can see with the current setup, is to disable all non-essential automations and live in a dumb house that doesn't unexpectedly turn the lights off when you're in the shower. If I disable these things now, I can maybe make the case to enable them at a later date.... but, as a lover of home automation, it feels really disappointing.

I've noticed in my C-7 logs that some devices definitely show repeating events + extra traffic. Not sure why exactly or if the events are actually repeated just logged that way. The SiLabs Z-wave 700 Series still seems like work in progress so we are experiencing "growing pains" from HE and SiLabs.

Fortunately both firmwares are updateable which means there will likely be a future resolution to most of these issues.

3 Likes

After entering a dark kitchen and a cold livingroom this morning, I had a deep look in the device events, here's what I found:


This is happening for each of my 3 mains-powered Aeotec Multisensor 6, included without any security.
Will revert to the standard drivers and see what happens.

I've had trouble with my usb powered MS6's in general. They make terrible repeaters for some reason. Replaced most of them with Inovelli 4-in-1s. Only have 3 left - all are on the fringe of my network (basement/garage/exterior back of house) and seem to be okay so far.. have not migrated those to the C-7 yet.

edit: am using an older @csteele driver right now. v1.6.13 and have also upgraded the MS6 firmware to v1.13.

I have reset my hub and all of the devices to factory default yesterday and re included some of the hardwired devices (total of 14 of devices) last night. This morning Ive included couple of battary powerd devices and here is what my log looks like at the moment. I did reboot and did the Z-Wave network repair and nothing works.

Try shutting the hub down via the UI menus.
When the hub is shut down, remove the power for 1 minute and then power back up again.
This will reboot the z-wave stick and may cure your problem.

Mine are the backbone to my house, all perfect running at 100k.

I'm using built in driver.

1 Like

That's cool I wish I could say the same... :neutral_face:

My MS6's originally were surrounding my C-4 main hub and so everything was repeating through them. Was getting a ton of 9.6 to various devices. Tried to do the trick of pairing with a battery then switching over to usb power afterwards in order to stop the repeating but that did not work with the updated firmware sadly. The MS6's on the edge of my network seem to be working great though - even the one I use for lux detection which has been outdoors in a clear weatherproof box for like a year or so..

1 Like

Unfortunately, you're not really going to know if you are overloading the network unless you go down the sniffer route. I went down that route full tilt.

I've built/re-built my C7 network 8 times from the ground up. 120 devices. As a general rule, each time I got around 80-90 devices, the network would start having stability issues. Didn't seem to matter what the 80-90 devices were. I have very little automation that creates Z-Wave messages. Definitely not overloading the network with automation.

I've learned a bit along the way. In my last go-round, I laid down a bunch of repeaters as the first devices, and then very slowly added devices to the network over a week. With this approach, I was able to build the whole network out, however it ended up loosing stability about a day or two after completion.

What happens (in my network) is that a random device pops a routing error, usually in response to a hail. This in turn causes the hub and the device to try and establish a new route. Given the number of devices I have there are many routes to choose from, so things can bounce around a bit. This process generally hangs transmit on the hub for a short period of time--up to 120 seconds but usually less. This is bad, but in and of itself is not so much of a problem. It just means that further operations are delayed. Annoyance rather than End Of The World.

However, if this happens right before a supervised device sends a message that requires application level confirmation (garage door openers and thermostats in my network) it can grow into a serious issue. If these devices don't get serviced within a relatively short period of time, they fall into discovery mode. 50-100 messages in a brief moment. And if a second supervised device happens to need servicing while this is underway? It results in a swarm of discovery messages involving multiple devices. Bad news for the network. Usually the hub recovers after two minutes, but sometimes requires a reboot to right itself.

For me, the solution has been to move a dozen of the further devices to a second hub. Even though these devices were within 3 hops of the main hub, and worked fine when there were only 50 devices, when the network was fully built out they seemed to initiate a lot of route changes that destabilized the network. Having moved those devices, I've been stable on both hubs for over a week. I hope it stays that way because my wife's tolerance dropped to absolute zero.

As I've said before, I don't think these issues can simply be laid at the feet of Hubitat. The issues appear to be down below, either in the SDK or the firmware of the 700 series chip. Either way, the issues are out of reach of the Hubitat folk. I wish it were different, but it is what it is. I'm sure it's even more frustrating to the Hubitat folk than it is to their users.

Overall, I think SiLabs still have some significant issues to work out with the SDK and the 700 series chips. I hope they do it soon.

14 Likes

Very good analysis, I can see that a lot of work went in to discovering the issues.

1 Like

I hope you shared that with Hubitat staff. I am not sure if they can do much about it (see below), but maybe that information will lead to a fix.

That is my understanding the situation. Hubitat's hands are basically tied on some of this. They are relying on the underlying SiLabs firmware to work correctly, which it mostly does. But there are apparently some half-baked and flawed parts of the firmware that has not been fixed.

:+1:

I don't have a 700 series hub (yet), but thanks for taking the time to diagnose this. I hope this hard work benefits others.

1 Like

@dennypage Very detailed diagnostic data you have there.. It may be able to create a work-around.. Thanks for your post

2 Likes

There is actually quite a few work-arounds in the code to issues we discovered in the sdk

6 Likes