[2.4.2.164 C8P] Command Retry Question

While the "Command Retry" feature is an amazing and awesome feature, I'm seeing a weird side effect and had a question.

Imagine (for simplicity) having a closet light that's turned on (dim to 100 over 2 seconds) by the door opening and off (dim to 0 over 2 seconds) by the door closing.

If the door is opened then closed 4-5 seconds later, you'd expect the light to turn on then turn off--and stay off.

But, what of the hub never got the "dim to 100" response back--even though the light actually did that? The first "dim to 100" command would be repeated up to 5 times--which means the light might successfully turn off, only to pop back on again afterwards.

I think I'm seeing such a situation, where it appears the second command succeeds while the first command continues retrying and results in some unexpected results.

My real situation involves going to sleep and turning all the lights in the house off with a Room Lighting Scene (so a fair bit of traffic on the Z-Wave & Zigbee network--I have fewer Zigbee devices and it's never failed to work as expected, but the Z-wave network regularly loses commands, which is why the retry feature has been such a huge improvement).

First, I dim the bedroom lights slowly to 40% over 60 seconds. Then, I turn them off after that (using a room lighting scene for the final "off").

I'm seeing the light dim properly, then go off properly.

Then, it pops back on to 40%.

The devices are all actually being properly set now (thanks to the command retry logic), but it appears the hub is not receiving the "I've done it" message back for some devices, so it keeps retrying. In this case, the older command is seemingly being retried after the later command succeeds.

Note that I see some "failed after 5 tries" messages in the logs, even though every device HAS successfully been set (the retry feature has really helped with this because, before, that was NOT the case).

It would be great if the "command retry" feature would cancel pending retries that conflict with newer ones, to prevent this sort of thing.

Thoughts?

Thanks.

Logs:

A) An initial "turn on" as I walked into the basement (worked fine)
B) A second (faster) "turn on", due to walking past the motion sensor (worked fine)
C) All worked fine.
D) When I shut the room door, it dimmed the lights to 40% slowly
E) 44 seconds later, I triggered the "asleep-turn MOST things off" RL App
F) The light turned off, as expected
G) These are mostly unrelated commands to control the LED Strip on the Inovelli dimmer
H) These are retries happening over a minute after the initial command--and 23 seconds after the "off" command.
Between H-I) This is the final "asleep-turn EVERYTHING off" command
I) Another, final retry of the Dim to 40% command about 2.5 minutes after the initial command
J) More unrelated Inovelli LED Strip commands
K) The light came on and, almost 4 minutes later, reported it was at 40%

You'll need to look at the "Events" tab on the device detail page for this device, which is the only way to see for sure what is going on. If you keep descriptionText logging enabled, as it is by default, you should also see this in Logs (along with some command retry information and possibly what commands are actually sent to the device if you have debug logging enabled, though the latter depends on the driver you're using) -- but "Events" is the authoritative source.

Without that, one can only guess at what's going on in your particular case, so this is something you can to do see for sure.

If you do have devices that don't accurately report back on their own, command retry will be a poor fit for them, but there may be other things you can do that would work around specific problems once you know what they are.

I was adding logging that shows things as you replied.

This doesn't seem to be one device not reporting as much as it seems to be "device reports" are being regularly lost when the network is busy.

In this case, it's one specific rule dimming the lights to 40%. Then, a bit later, another rule shutting all the lights in the house off.

As you can see in the logs, the "Off" command succeeded and the device response got back in 3 seconds--but, it appears the switch didn't get the ACK of that, so it tried again with another "I'm off" 3 seconds later. So, the device IS responding.

And, after the final set-40%, you can see that the "I'm at 40%" did get sent back.

So, it seems that it's just some network congestion (something that's apparently an annoying Z-Wave bug that isn't fixed even in the latest ZWJS). In this case, I'm seeing this one with the legacy Z-Wave stack, as I reverted due to some other issues that are still pending (can't include a range extender & when I tried again with the legacy ZW, it tried to overwrite an existing device!).

My initial workaround is that I'm going to do that "dim to 40%" over 5 seconds, not 60. That should let it get done in a more timely manner. :slight_smile:

But, the main question is: do "retries" from earlier, conflicting commands get cancelled or not? It seems not. It might be helpful to kill retries for earlier dims/on/off when a later dim/on/off happens???

Here's the event log (note that I never actually performed any "physical" actions with that switch--it appears those are related to the "device status update" messages).

They aren't.
However only one command for a device can be active at a time.
If a device is in the middle of an unresolved retry, further commands sent to the device will be ignored by command retry.

1 Like

If one were to use Command Retry to the max, as in all devices that support it, could the hub get overloaded?

What are the negatives?

1 Like

In testing I ran it against 50 some odd Zigbee devices and didn't have anything blowup...
The overhead to add and remove commands from the retry Q is extremely small.

1 Like

How about Z-wave?

command retry doesn't care about the transport protocol.
My point was that if the initial command succeeds for any given device there isn't any appreciable overhead on the hub.
Now if all your devices are off line and some rule tries to change the state of them, sure, its going to add quite a bit of traffic to any given mesh, but I might suggest in this case you've got other much bigger issues to deal with...

4 Likes

As an FYI, I have it enabled on all 5 of my production hubs for all devices, 78 devices in total

2 Likes

Thanks for the information! The retry feature is the most positive, impactful feature yet.

It's helpful to know a bit about how it works so I understand what's happening.

Thanks so much!! :slight_smile:

2 Likes