A few devices work so poorly now

I had five or six ghost devices. I was able to remove a couple using the HE remove tools, but no matter how many times I tried, I could not remove the others. Eventually I fired up Windows in Parallels on my Mac, dug out an old Aeotec Gen 5 stick I had, and followed the PDF instructions, and got the ghosts removed. It took several tries to remove one of them (SPECIFIC_TYPE_SOUND_SWITCH, don’t even know what that belonged to). Was really hard to figure out which powered device the ghosts belonged to.

I gotta say, this is terrible. SiLabs really needs to improve this aspect of their Z-wave stack.

Now, I'm an expert macOS developer, and have done a fair bit of embedded work. I even have a SiLabs 8 series dev kit here because I want to make a special LED driver. I barely know the Z-wave SDK and stack, but if someone else who knows a lot more would like to work with me, I’d be willing to work on a macOS tool to help with things like this ghost removal. I don’t want to try to re-create the whole PC Controller app in Simplicity Studio, but maybe we can cobble together something that’s good for removing ghosts.

Oh I got ahead of myself. I was waiting for my hub to reboot after putting it back where it belongs.

I still can't reliably use many of my devices. I have no more ghost nodes, but nothing has improved.

I updated to the latest HE software today, and for the first time ever I got an app CPU load alert. I checked it out, and saw that my Node-RED integration was using 100%. I thought maybe such heavy CPU load was preventing my rules from working properly, so I disabled that, and it came down.

But it made no difference.

Watching the logs, I can see that a button remote press gets record most of the tie, but not always. However, the corresponding rule and lights don't get log entries when it doesn't work, and on the rare occasion that it does work, there are log entries for the rule and the light.

Siri commands are almost always reliable. I can ask Siri to turn lights on or off, and within a couple of seconds, those lights respond. This leads me to believe there’s something wrong with my rule processing.

There is no need to power off any other devices, it does not do anything useful to help with removal. It is a long standing myth.

Go to Logs > App Stats / Device Stats
Turn on all the columns in display info.
With the default sorting, get a screenshot of the full page including the stats at the top. Just the top section that fits on one screen is good, just want to see the top 10-15 or so. Do that for both App and Device stats.

Also might be good to see a new screenshot of the zwave details after the hub has been running for at least 12 hours since last reboot (please specify uptime when you post).

Someone ought to update that otherwise lovely ghost-removal PDF, and maybe some of the related posts :slight_smile:

The requested screenshots (would be nice if there was a way to get a reasonably-formatted text dump of all this information, just for the purpose of sharing the information).



Looks like your hub was up for 13 hours but hardly did anything in that time, does that sound right? Seems like a very idle hub. Does not make sense that rules would not process. I have seen rules failing on hubs with poorly coded drivers and apps hammering away at disconnected IP connections, but otherwise the rules seem very stable.

Are you sure that some of the times the rule failed you actually got the button event on the hub? Or is it possible every time it fails it is because the event never makes it to the hub?

Some of your devices with no routes showing look like seasonal devices? Warning: some people here may freak out and say you must exclude those. I am not one of those, I often have 2-3 unplugged devices as well. I think especially when the neighbor count is 1 (hub only) that no other devices should ever try to route through it, so no real harm.

1 Like

Yeah, it doesn't do much if I don't press buttons.

In the logs I can see, for example, the button press event show up, but nothing else gets written, and the light doesn't turn on (or off).

When it does work, I see the button press event, and the light device gets an entry, and so does the rule (at least, the ones for which logging is enabled).

To be clear: I do also often have issues where my button devices (both battery and house-powered) fail to show up in the logs. Sometimes they'll work for a while, other times they won't, even when I'm within a few feet of the hub.

Generally, Siri commands always work, which means the hub has no problem communicating with the various switches and dimmers all around the house.

What are the walls in the house made out of it, and what are your electrical boxes made out of? Just looking at the z-wave details it seems like a lot of devices may be struggling with the connection but they really should not be unless your house has a massive footprint.

Take a look at this post. Has some rules of thumb

The older part of the house is lath and plaster. The addition is wood and drywall.

Keep in mind, this used to work quite well. It's only in the past month or two that it has started giving me so many problems. The only changes (prior to the recent ghost cleanup) was that I shut down some of my front yard halloween devices. These are in the front yard, whereas the hub is near the back of the house, where the addition is.

The remotes that fail to work are line-of-site, 1 - 2 meters away from the hub. The wall switch is also less than 2 meters away, and just inside the door jamb from the hub. Its box is plastic.

If the shut down devices are wall plugs, plug them in somewhere, scattered around if possible. You could even leave them where they were but I am thinking maybe put them in good places to be a potential repeater for other devices.

Then run a repair on them.
Once that passes you could also try a full mesh repair if you would like (no more than once).

It is possible with a weaker mesh, maybe devices keep trying to route through those dead nodes. I have a lot of nodes and everything has a lot of neighbors so that is possibly why unplugging a device does not have much impact for me.

The button device that fails was directly connected to the hub, and would never have connected to those removed devices.

I did disable them in the devices list, does that not end up rerouting around them? What does it mean to disable a device?

Disabling it just makes it so the driver wont process any incoming messages.
The only way to remove it from the z-wave mesh is to exclude it, but that also takes out the device in the HE devices list, breaking automations. That is why I avoid excluding seasonal devices and suggested to just relocate them for now.

1 Like

I can try to do that, but are they really the reason a directly-attached device isn't getting heard?

No idea, process of elimination.
I am going off what you said that around when you unplugged those devices is when the problems started. If plugging them back in does not help within 24-48 hours then we move on to something else.

Also, have you tried brand new batteries in the remotes to be sure? I think one of your problem devices is hard wired ZEN32 though, so obviously not a power issue for that one.

1 Like

I’m sorry if this was mentioned, but have you tried doing a reboot with the rebuild database option checked? Just in case it is a software corruption causing these issues.

I haven't tried that, no. I'll keep it in mind.

Yesterday I decided to change the setup of one lamp in my living room, which entailed putting it on a different Z-wave outlet. I went to rename the old and new devices in HE, and fix up my rules, when I noticed something. That device was connected to the hub via like 5 other devices. The button devices I was having the most trouble with were connected in turn to that.

I tried rebuilding it, hoping it would end up with a more direct connection, but had no luck. So I tried excluding it and deleting it. That created a ghost, which I needed to break out the PC stick to fix. But after that, everything seems to be working much better. I'm cautiously optimistic that it will hold.

I still have one device with far too many hops, but none of the others depends on it.

Routes

I think I've asked this question before, but I don't really understand the route descriptions in the z-wave details. I'll have something like:

Device Route
0x0A (outlet) 01 <- 02 <- 03 <- 04 <- 0A
0x11 (button) 01 <- 0A <- 11

I would expect 0x11 to look like 01 <- 02 <- 03 <- 04 <- 0A <- 11 instead.

The mesh view shows practically everything connected to everything, so that's extra confusing.

That means you did not actually exclude it. When exclusion works the node is removed. Do not force remove / factory reset a device until you confirm exclusion worked and node is removed.

The routes is just the last route the device reported back to the hub, it may not be current, routes can change quickly if needed. Routes are independent of one another, so in your example 0A was hoping through 3 other nodes to get to the hub when sending messages for itself. BUT when 11 sends a message to 0A it tells it the next hop that it wants, so 0A would just fire it back out sending directly to 01.

The mesh view shows all possible neighbors, not routes.

1 Like

I've been fooled before, but this time I think everything is finally fixed. After many attempts at removing unused devices and removing ghosts using a PC and USB stick, what finally seems to have fixed everything is a brief power outage.

1 Like

If that is the case, be on the lookout for a z-wave device going bad. I have had a couple go bad and spam the mesh to where nothing else would be able to communicate. Power cycling the problem device would “fix” it for a short time.

1 Like

What's weird is most devices work fine. And it's always the same set of devices that fail. In any case, if it happens again, I'll try some form of process elimination to see who’s the culprit.

1 Like