Curious Zigbee Problems with Outlets

Several days ago, on two different production HE hubs, I simultaneously experienced no events on a bunch of battery-powered zigbee devices (e.g., motion sensors, contact sensors, etc.). At first I thought it might be the greatest battery coincidence in the history of the planet, but I quickly ruled that out. FWIW the affected devices are mostly either motion sensors (a mix of Hue motions, Iris v2's, and Smartthings/Aeotec) or contact sensors (Visionic MCT340's and Smartthings/Aeotec). Both sets have been rock solid for years.

Then I realized some of my zigbee devices were doing fine. Upon further troubleshooting, I noticed that all the problem devices seemed to have one thing in common: they were all using my zigbee outlets (Jasco Enbrighten 43102s or Smartthings Plugs) as repeaters, which form the primary ring around the hub to establish good mesh for the outer reaches. End devices that were connected directly to the hub seem to be behaving fine. When I then zeroed in on the outlets, it turns out they were misbehaving. I can turn them on/off physically, but not through the device details page. They ignore all inputs.

  • Nothing remarkable in the logs.
  • My mesh is solid--one hub has 49 zigbee devices 6 of which repeat; the other only has 13 zigbee devices, 3 of which repeat.
  • That said, only 1 repeater on the first hub and no repeaters on the second hub shows up in the neighbor table when I look at the getChildAndRouteInfo page.
  • I have tried switching zigbee channels and waiting. No luck.
  • I have confirmed my 2.4g wifi network is down at channel 1, whereas my hubs are on zigbee channels in the low-to-mid 20's, so no conflict there.
  • I have no bulbs on these hubs, so that's not it.
  • I rebooted the hub. No help.
  • I did a soft reset, and then reloaded a backup. Still same problem after waiting ~10 hours.
  • I even successfully re-discovered a number of the zigbee outlets. No change.

I'm out of ideas. It would seem that zigbee outlets are the culprit since it happened on two different hubs simultaneously and they're the common denominator. But I don't really know. All of the affected devices use Generic Zigbee Outlet as the driver. Could they have received OTA firmware upgrades that killed their ability to repeat? I didn't think HE supported OTA updates for Zigbee.

Edit: on one of the hubs, when I look at the zigbee log (the one with 13 zigbee devices 3 of which repeat), I get a device “0000” and profileID:0x0, clusterId:0x8032, sourceEndpoint:0, destinationEndpoint:0, groupID:0, lastHoplqi:255, lastHopRssi:0. Seems suspiciously like a ghost device are an offline zigbee network, but neither appear to be the case. Stumped.

Anyone else having a similar problem? Or any ideas before WAF is irretrievably lost?

I am having very very similar issues. However, I Have not had issues with my plugs they seem to work. But I have had issues with my hue motion and 2 iris contact. After a reboot or power cycle my hue stop reporting and generally come back within 2 to 6 hours. Last night for the 1st time I had trouble with my 2 iris contact, when I pulled the battery on each one they both rest them selves and started to blink blue. It took me about 10 tries to get them both connected and working again. I was not having these issues when I was on .228. They seemed to possibly start in .229 and got really bad in .30.

Check your routing table. Might be they are not working correctly.

I did check my routing table, but not sure how to interpret or what to do about it.

I had to reset a Smartthings contact sensor last night by pulling the battery, myself. I have also been seeing odd things like all my zwave devices went down to 9.6K speed until I powered off the hub for 30 seconds then they started going back up. The other night the button (pressing down twice) on a switch didn't work until I air gapped it. Hopefully the reboot fixes whatever is going on, the battery pull on the zigbee contact made it start talking again, can't remember what else I have had to reboot lately.

This sounds like you have a bad device in your mesh...

2 Likes

Do you mean something like this?

1 Like

Zigbee Routing Table - #2 by JasonJoel

1 Like

Unfortunately, that shows a highly incomplete view of the Zigbee routing table. (and partially confusing; e.g., it can show devices routing through themselves)

Am I missing something? How would you propose they use the data from that table?

1 Like

Zigbee's getChildandRouteInfo actually gives you a pretty good indication of how your mesh is performing, but only for the 'first hop'. Not all repeaters in the mesh are considered neighbors; even when they are within radio range. Load up your mesh with redundant repeaters, fine.. the hub may not use them, nor list them in this table; note that doesn't mean they are useless as they may function as parents to sleepy end devices. But the neighbor designation is reserved for those the hub is currently tracking, and which the routing algorithm will rank (using cost values for each path segment) for use as next hop destinations (the 'vias') for routed messages. A device 'routing through itself' is just a neighbor repeater that is also the destination of a message directed to it.

The intent of getChildandRouteInfo is not to show the routing table; that is not how this info is portrayed by SiLabs. I forget the actual quote from their docs (it's in an old post I made), but it's intended to show how the most important RF links are performing-- those that are within a 1-hop radius of the hub. So it gives key information on which repeaters-- if any-- the hub is actually going to use when sending direct (to child devices or neighbors) or multi-hop (routed) messages. Specifically, how well the hub can hear its neighbors (the LQI) and how well those neighbors can hear the hub (which a neighbor reports as the outCost for a path, derived from that neighbors' LQI on the link).

Since the hub only tracks the best 16 repeaters for a routed message-- that's the max that the neighbor table can hold-- if RF conditions are stable in both directions on each link (keeping in mind that the neighbor repeaters also maintain similar neighbor tables) the repeaters in the table will stay the same. If neighbors stop checking in (unlike Z-Wave, the Zigbee mesh is never idle and always carries link status traffic), you'll know it because either they will be evicted from a full table, or they'll show high age counts/zero costs (meaning missed check-ins) or high non-zero costs (relatively high error rates).

If my Zigbee devices start misbehaving, first thing I would check is the neighbor table (it's good to know what devices normally populate the neighbor table when your mesh is working well. Keep a list of them and compare to the current state if things go south).

If the neighbor table's empty, and you actually have repeaters in your network, that's really bad-- no routing is happening. Only child devices of the hub are reachable. If it contains only one or two neighbors, those are the only devices the hub considers viable repeaters at the moment. This in itself may not indicate a problem, just how your mesh is configured.... repeaters not shown anywhere in this table may still be forwarding messages downstream in the mesh. But If devices that used to be working are no longer functional, you may conclude that a repeater that was once a key neighbor is now missing in action.

1 Like

FWIW, this ended up being the root cause problem: Release 2.3.0 Available - #10 by gopher.ny Solve in the most recent firmware release.