Repair

I am seeing some node repairs fail in an unusual way.

What I normally see in a successful repair is this:
> Find Nodes In Range
< ack
< Command Complete
> ack
> Get Nodes In Range
< ack
< Node Range Info
> ack
> Assign SUC Return Route
< ack
> Assign SUC Return Route
< ack
> Assign SUC Return Route
< ack
> Assign SUC Return Route
< ack

Sometimes, the last two Assign SUC Return Route messages are not sent by the hub. When this happens, the log reports "Repair failed node neighbor discovery (timeout)"

Most of the time, I can re-run repair on the same node and all four Assign SUC Return Route will happen, and the repair succeeds. However I also have a couple of nodes that the repair persistently fails in this manner.

I've included a couple of sniffer examples below. Any thoughts on this?

Thanks


I'm seeing the same thing on my logs (for a couple of nodes). Also, I'm having some type of z-wave storm and can't figure out what's causing it. Everything works perfect for several hours/days and then "bam" chaos, at which point I can only reboot and repair.

I don't have the sniffer you are using. Can I ask what you are using (aka your setup)? Very interested in having this level of detail. As a Network Engineering I feel like I'm "flying blind" in the zwave world. Thanks in advice.

2 Likes

There is a beta running for a change that could be the cause of your storms, however I think there is another issue that needs to be worked out before release.

On the sniffing front, I totally understand you. Without tcpdump and the like, the world is a complete mystery. There is a really good sniffing guide available here. You will need to buy a USB stick and sign up with Silicon Labs to get the software.

1 Like

I'm seeing this more. Nodes that successfully repaired yesterday now exhibit this behavior today. Sometimes repeated repair attempts eventually succeed, sometimes they don't. @bcopeland, any insight on this you can provide? Thanks.

FWIW, additional testing has shown that when this occurs, the radio hangs (will not transmit) following the last Assign SUC Return Route message. It recovers after ~2 minutes, which coincides with the repair failed message in the log.

1 Like