Zigbee PAN Identifier Conflict

markus · July 25, 2020, 12:25am

8 days have passed, the last 4 days my mesh has been stable.

What I want here is to follow up on what I did to stabilize it and why it worked. Hopefully this can help others to diagnose similar issues.

When analyzing the traffic in Wireshark I was seeing a lot of Beacon packets, upon further analysis I found that there were multiple other Zigbee meshes around me on the same channel (25). This led to that once every 12 to 24 hours ONE incorrect beacon packet (out of about 80 000, 17.2% of all packets, per 24 hours correct ones) was sent and caused HE to instantly (see the first post for the details) change PAN ID. This change is due to how Hubitat has chosen to implement how to react to these packets, it could be done differently and still be within the specs.

Each time the PAN ID changed, most of my end devices dropped off for 3-12 hours or so (Xiaomi/Aqara devices). Some didn't return at all (1 or 2 every time), some returned after just 1 or 2 hours (other standard Zigbee devices, such as Sonoff sensors).

Since the excessive amount of Beacon packets were due to the many controllers around me on the channel, I switched to channel 26. I chose this channel since I know no gateway sold to Chinese consumers will select that channel.

4 days later my logs show a total of 702 beacon packets in FOUR days (0.3% of total packets) - All correct and none of the type that will cause a PAN ID change. My mesh is now stable.

With this said, if the PAN id had not been changing, my mesh would have been stable the way it was, sure it wasn't an ideal environment, but all my automations fired instantly and I had 0 issues except when the PAN id changed.

How to use this information to troubleshoot mass dropout of devices you may then ask? Look for a PAN ID change. Since the PAN ID will not update in the UI until after a reboot of the hub, if you have lots of Zigbee devices drop all at once, write down your current 16-bit (2 bytes, 4 characters) PAN ID (NOT the 64-bit EXTENDED PAN ID). Then reboot your hub and look for a change in the PAN ID, if it has changed you have your culprit for the drops. If this happens more than once, change your Zigbee channel (there are whole threads discussing which ones to use, you probably don't want to use the one I chose). When changing your Zigbee channel you probably will have to re pair many of your devices, but there is no need to delete them from HE, just pair them again.