Z-Wave event processed by wrong device

Rarely, once or twice a day, I notice a device handler randomly processes an event which is intended for another device. For example, a GE light dimmer tried to process a SensorMultilevelReport event (lux measurement) that obviously came from an Aeotec Multi-Sensor.

I don't think the hub is at fault. Rather, I think something went wrong in transit, between the multi-sensor and the hub. And the hub ended up with an event that had a wrong Node ID.

Any idea on how to troubleshoot such issues?

Thanks,

It would be the first time I've even heard of such an error... the Z-protocols have checksums to prevent simple transmission errors and thus we see LOTS of people saying their mesh is slow, which is what CRC errors look like.

The checksum is not going to handle every type of scrambling of bits. It's really good at single bit errors, for example. Therefore if you're suspecting so many bit errors that it can actually become another, apparently valid packet, I'd suggest doing a deep dive into your mesh.

Equally hard to imagine is a DB corruption that would cross-wire one type of report to a different device, especially if it's intermittent. However, that would be a pretty easy thing to try... do a Soft Reset by following these instructions:

https://docs.hubitat.com/index.php?title=Soft_Reset

5 Likes

I do see this too. I just ignore it as it doesn't bug me enough and its not breaking anything. But it has been happening for a long time (more than an year). I only see them because when my driver receives a zwave event it doesnt know it kicks an error with details and it happens a couple of times a day.

I also have no idea how I would troubleshoot this.

Like @csteele, I have never seen this issue before, so I am very interested in digging further into it. When it happens again, mind sending me a PM with a screenshot of the logs with the error, along with your hub ID, so we can look into it? Theoretically speaking this could not happen with Z-Wave devices.

4 Likes

Are those devices running at 9.6 or 40kbps? Those data rates only use an 8-bit frame check sequence and have much poorer error detection than 100kbps links.

100kbps Z-Wave links use 16-bit CRC's (polynomial) for the frame check sequence; 9.6 and 40kbps links that aren't using S0/S2 security only use an 8-bit XOR checksum (unless the driver is using CRC-16 encapsulation commands).

A simple checksum will detect 100% of single (or odd-number) bit errors, but studies show with random data undetected error rates could exceed 12% (luckily Z-Wave frames contain a lot of non-random data fields). 100kbps links with 16-bit CRC's provide orders of magnitude better error detection.

Every few months, my Aeotec Gen 5 Doorbells (one on HE, one still on ST) will play an incorrect mp3 track ("Kitchen Leak Detected" instead of "Mail has been delivered" gets my attention right away). My C-3 doesn't report the Z-Wave link speed; I've been meaning to put it on the C-7 and see what speed it connects at.

I don't see any pattern in terms of link speed. From the last 4 occurences, it was devices at 9.6, 40 and 100kbps. Like @gavincampbell, it's been doing it for at least a year. But I don't see any adverse effects on my mesh, I have 64 Z-Wave devices and everything runs smoothly.

What's new here, is that I could tell the data was 100% legit. The value (4813lux) the dimmer received was inline with the current value from the multi-sensor. Real value, processed by wrong device.

Over the past 3 days, I only got 4 occurences:

  • SensorMultilevelReport (lux) processed by a dimmer
  • SwitchMultilevelReport (dimmer level) processed by a thermostat (2x)
  • SwitchMultilevelReport (dimmer level) processed by a light switch

It's random; and with thousands of events, definitely rare.

@bobbyD I'll try to get better logs for you, this weekend.

1 Like

@csteele Thanks, I didn't know about the Soft Reset. Maybe that's the actual fix, I'll try it as a last resort.

A Soft Reset is a handy tool. It is as harmless as such things can be. And it's a skill you want on the morning you really need it. :slight_smile: Conceptually, it's ultra simple. You export a backup which, in the process cleans out any junk in the DB or associated files. Then you blow away the existing DB and associated files, making the hub look a lot like the hub you had an hour after opening the box for the first time. Final step, you restore the backup you just made and voila, you hub is just like it was 10 mins ago... but with a clean DB.

I use this a LOT. As a developer, I often want to go to a specific situation.. aka the hub, before I install a package. That way I can test the package. So I have a "library" of pre-configurations and I soft reset, and restore something from the library. 10 mins later, that test is done, and I soft reset to yet another backup from the library :smiley:

The point I'm trying to make is it's a very reliable tool but the first time through, it's a lot of reading what's next. Getting used to the screens in other words. I suggest you don't want to see them for the first time when you have a corrupt DB warning.

10 Likes

Hey Bobby. Just curious if anything was discovered. I see this multiple times a week. My devices recieving the wrong event.

For example, I was going through my logs and just saw (these are my own drivers) the following which was sent to a switch.

[zwaveEvent] Unhandled cmd: SecurityPanelModeGet()