I'm back. Unfortunately, my effort to get a firmware update from Ledvance for my problematic LED strips was unsuccessful, and they continue to randomly go unresponsive. I've developed a pretty reliable resolution path, which is to spam the device with about 50 on/off commands in rapid succession, and then reboot the HE. Usually that gets them back. Today I wasn't so lucky. I used the opportunity to collect some more data and have some findings to share.
The last time I had an issue with a device going poof, I discovered null entries in the Zigbee neighbor table:
That screenshot's from about a month ago. Tonight, more null entries. The name of the missing device tonight is "Backsplash". It doesn't appear in the neighbor table, but this null entry does:
After doing my command spam, Backsplash did make an appearance in the neighbor table, but in a state I'd never seen before (I'd never refreshed it immediately after issuing the rapid-fire commands): "In Discovery"
It remained there for only a brief moment before disappearing out of the neighbor table, and unfortunately, it still didn't respond to commands. I cycled through this a couple times with no success.
Ultimately, I had to do the very annoying factory reset procedure for the LED strip and put it in pairing mode, then re-run discovery. Once I did that, HE picked it right back up and I had it responsive again. It appears in the neighbor table routed...through itself, apparently, but it is working:
Whenever I have these issues, I've always got debug logging turned on for the problem device. Usually, when the device comes back alive after an HE reboot, the log records an address change for the device once I send it a successful command. It did so again tonight:
Something caught my eye there tonight. The "old" address was 8D08, just like it was a month ago (for a different unresponsive device, not this same one). Yet, when the device was non-responsive and "in discovery," it showed a different address in the route table.
In fact, I can recall at least one other instance of this happening where I distinctly remember the old, unresponsive address being 8D08.
From this, I can draw a couple conclusions, though they don't yet point to a root cause:
Whenever I have "null" in the route table, something either is or is about to be wrong.
These particular LED strips don't agree with address 8D08. I can't say for certain that having that address always makes them malfunction, but every time they malfunction, I've seen that address involved.
From there, I speculate that the problem device went to move on the mesh, and its own understanding of what address it went to disagreed with HE's understanding of same. HE was trying to send commands out to Backsplash at BF4B, but there was no Backsplash at BF4B, because Backsplash was really at 8D08.
One log entry I didn't get tonight when recovering the device via re-pairing was a zigbee device announcement in the logs. When my command spam+reboot method works, the address change log entry is often near a log entry with the device announcing itself.
Questions I have from this are why null keeps appearing in my route table as a portent of doom on the mesh and why the address 8D08 appears to exist as an event horizon for my devices: lights enter, but cannot escape.
I'd be happy to provide any additional information for developers as may be useful, including log access.
Thanks in advance!