Zigbee mesh is going crazy after years of stability - help

Help, my Zigbee mesh appears to be freaking out, and I've tried everything I know to find and resolve the issue. I'm out of ideas.

This is a C-5 hub

Issue:
Something is causing my non-repeating Zigbee devices to spam my hub with messages, starting at some point in the last week or two. I can't trace it back to any specific event or device install.

These devices, like contact, motion, and temperature sensors, are typically quiet unless triggered. These devices seem to be going crazy with messages sent to the hub.

When did this start:
About a week ago, I noticed a Zigbee light bulb here or there would stop responding occasionally. Turning the light switch off for 5 seconds and then back on would resolve it. That was my first hint something was wrong with my mesh.

Tonight, 3 light bulbs in my house stopped responding. Again, manually turning off/on the switch resolved the issue, but I decided to look at my Zigbee-specific logs and saw them getting spammed with messages.

My Zigbee Logs were pretty quiet a few weeks ago, so this is not typical behavior.

Things I've tried, in order:

  1. Examined all logs (regular, Zigbee specific, and ChildandRouteInfo) looking for patterns but couldn't detect anything other than it mostly seems to be non-repeating things like sensors or lightbulbs, though there are certainly repeaters that show up in the logs occasionally.
  2. Update the hub to the latest firmware (v2.3.3.123)
  3. Reboot the hub
  4. Soft reset/restore from backup
  5. Shut down and unplug the hub for a few minutes, then turn it back on
  6. Turned on debug logging for every (known) Zigbee repeater and then monitored the regular logs. None of them seemed to be causing the spam.
  7. Turned on debug logging for a few (known) Zigbee NON-repeaters, such as the ones causing (or being caused to) spam. The regular logs then fill up with spam from those items, but it is an array of things from different manufacturers. I don't understand how my battery-operated devices, like sensors throughout my house, are spamming so much because usually they only report when something triggers them (motion, temp change, etc), but I think it indicates a problem with my mesh.
  8. Unplugged some Sonoff S40 Zigbee plugs I had installed a month ago as I thought I saw someone on the forum mention Sonoff plugs causing problems. Unplugging all of the Sonoff S40 plugs had no impact. Even when these plugs were unplugged, Zigbee spam still flowed in the Zigbee logs.
  9. Examined all logs again, but still can't find any patterns to debug further
  10. Searched the forum for anything I hadn't yet tried
  11. Posted this thread

I'm open to ideas on how to resolve this without having to throw away 3 years of tweaking everything to get my hub and house set up perfectly...

Screenshots/Info:

  1. Zigbee ChildRoutingInfo - every time I hit browser refresh, the age values or devices listed change, even if I hit it once per second. Maybe that's normal. I don't spend a lot of time on this page.
  2. Zigbee Logs - again, this is just flowing with messages. See the timestamps for the frequency of occurrence.
  3. List of all Zigbee devices (the Generic sensors are a mix of Smart Things, Lightify, and Aeotec)

ChildRoutingInfo:

Zigbee Logs sample (note the timestamps - multiple devices per second non-stop)

List of Zigbee devices:

  1. Studio Motion and Temp (Generic Zigbee Motion Sensor)
  2. Basement Drain Water Sensor (Generic Zigbee Moisture Sensor)
  3. Bathroom Light Left (Sengled Element Classic)
  4. Ceiling Office (Hampton Bay Zigbee Fan Controller)
  5. VR Lighthouse Closet (Centralite 4200-C Zigbee Outlet)
  6. Turntables (Sengled Element Color Plus)
  7. Dresser (Sengled Element Color Plus)
  8. Studio Window - South (Ecolink 4655BC0-R Zigbee Contact Sensor)
  9. Kitchen Sink ( 72569 Sylvania LIGHTIFY Edge-Lit Under Cabinet Adjustable White)
  10. Basement Storage (Sengled Element Classic)
  11. Cans (Ikea TRADFRI Control Outlet)
  12. Side Entry (Sengled Element Classic)
  13. Shade - Living Room Left (Ikea Fyrtur Shade)
  14. QSC Left Speaker (Sonoff S40 S40ZBTPB)
  15. Upstairs North 1 (Sengled Element Classic)
  16. Basement Shelf (Sengled Element Classic)
  17. Fireplace L (Sylvania / Osram Zigbee RGBW Bulb)
  18. Basement Bathroom - water sensor (Generic Zigbee Moisture Sensor)
  19. Patio Lights (Sylvania Gardenspot RGB)
  20. Studio Stairs Motion (Centralite Micro Motion Sensor)
  21. Basement Red Room Water Sensor (Generic Zigbee Moisture Sensor)
  22. QSC Right Speaker (Sonoff S40 S40ZBTPB)
  23. Ceiling Bedroom (Hampton Bay Zigbee Fan Controller)
  24. Fridge (Sengled Element Classic)
  25. Bathroom - Bath Shower Utility Closet (Generic Zigbee Moisture Sensor)
  26. Studio Window - North (Ecolink 4655BC0-R Zigbee Contact Sensor)
  27. Spin Fan (Centralite 4200-C Zigbee Outlet)
  28. Shade - Living Room Right (Ikea Fyrtur Shade)
  29. Dining Room Zigbee Repeater (Ikea TRADFRI Signal Repeater)
  30. Bathroom Motion & Temp (Sylvania Lightify Smart Sensor)
  31. Movie Room (GE Zigbee In-wall Smart dimmer 45857GE ZB3001)
  32. Living Room Motion & Temp (Generic Zigbee Motion Sensor)
  33. Living Room Corner (SYLVANIA SMART+ ZigBee Bulb A19)
  34. Porch Zigbee Repeater (Ikea TRADFRI Signal Repeater)
  35. Counter (Advanced Zigbee RGBW Bulb)
  36. Craig Nightstand (Sengled Element Classic)
  37. Basement Gas Sensor (Heiman Zigbee Gas Detector)
  38. Basement Bathroom Heater (Ikea TRADFRI Control Outlet)
  39. Easel (Sonoff S40 S40ZBTPB)
  40. Wyze Cam - Living room (Sonoff S40 S40ZBTPB)
  41. Basement JBL Sub (Sonoff S40 S40ZBTPB)
  42. Upstairs North 2 (Sengled Element Classic)
  43. DJ (Sonoff S40 S40ZBTPB)
  44. Basement Mixing Board (Sonoff S40 S40ZBTPB)
  45. Laundry (Sengled Element Classic)
  46. Surface Book (Ikea TRADFRI Control Outlet)
  47. Fireplace R (Sylvania / Osram Zigbee RGBW Bulb)
  48. Air Conditioner Condensate Pump Water Sensor (Generic Zigbee Moisture Sensor)
  49. Office Motion and Temp (Generic Zigbee Motion Sensor)
  50. Bathroom Light Right (Sengled Element Classic)
  51. Wyze Cam - Kitchen (Centralite 4200-C Zigbee Outlet)
  52. Bedroom Zigbee Repeater (IKEA Tradfri repeater)
  53. Stove (Sengled Element Classic)
  54. Studio Light (Sengled Element Color Plus)
  55. Red Room Zigbee Repeater (Ikea TRADFRI Signal Repeater)
  56. Upstairs South 2 (Sengled Element Classic)
  57. Hot Water Heater Water Sensor (Hot Water Heater Water Sensor)
  58. VR Lighthouse Speaker (Centralite 4200-C Zigbee Outlet)
  59. Nook (Sengled Element Color Plus)
  60. Kitchen Floor Water Sensor (Generic Zigbee Moisture Sensor)
  61. Kitchen Motion and Temp sensor (Generic Zigbee Motion Sensor)
  62. Hallway (Sengled Element Classic)
  63. Theater (Centralite Zigbee Plug)
  64. Retro Pie (Sonoff S40 S40ZBTPB)
  65. Shade - Porch (Ikea Fyrtur Shade)
  66. Kitchen Sink Water Sensor (Generic Zigbee Moisture Sensor)
  67. Front Door Light (Sengled Element Classic)
  68. Upstairs South 1 (Sengled Element Classic)
  69. Studio Window - East (Ecolink 4655BC0-R Zigbee Contact Sensor)

Do any of you have ideas of other things for me to try to get my Zigbee mesh to calm down so my system can return to the level of stability I've enjoyed for a long time? Thanks!

2 Likes

You might have a repeater gone bad, but it also could be an issue with interference or other RF related issues. Have you tried powering off/restarting any of the TRADFRI repeaters?

@Tony Has a few posts in the community explaining the data in the ChildRoutingInfo that you might find helpful such as this one:

Crazy Idea: Have you added any new electronic devices or changed/moved any cables within 3m of the hub recently? (Sometimes unexpected things like a USB 3.0 HDD can be the cause of RF interference.)

3 Likes

Thank you for the thoughts and your reply. Yes, I tried powering off the TRADFRI repeaters. Even when off (unplugged) the spam continued so I plugged them back in.

Regarding interference anything is possible. However I have not added new wifi devices or any USB 3 cables/devices in 5-6 months, and all PCs are at least 7-8m from the hub.

Please keep the ideas coming. WAF had also been stable for years and is at risk of waning if I can’t sort this out. I’m open to buying a C7 and migrating devices if need be, but with the large volume of Zigbee and zwave devices and lack of free time at the moment that sounds daunting…

Without having read the entire thread..... Have you compared the Zigbee channel(s) and Wi-Fi channel(s) involved? (Ignore that, the experts have arrived.... :wink: )

2 Likes

I've got a couple Sylvania zigbee bulbs that act as repeaters and any bulbs that repeat always made my mesh less than stable. This was true with a Wink hub and Smartthings.

Great detailed post, it really helps others, staff included, to learn about your issue, what steps you've taken on your own to remedy the situation, and what might cause issues within your mesh. Kudos to you for taking the time!

A few thoughts:

Soft reset/restore doesn't help with mesh issues if the mesh is in trouble.

To force Zigbee to rebuild routing table, powering off the hub and keeping it off for 30-45 min might help, otherwise power cycling is less effective for Zigbee.

Unplugging working devices can only make things a lot worse, as if your mesh was already struggling, now it has to deal with more devices that are unresponsive, therefore sending the radio to work overtime.

3 Likes

Very good suggestion, and yes. Wifi setup is a Netgear RBR/RBS 750 with 2 satellites (one with wired back haul). Channel set to auto, and currently using 9 on the 2.4Ghz band on the main unit that has peacefully coexisted with the hub for a few years. The C-5 Zigbee channel is 20. I have never changed its channel since purchase.

Indeed the experts are here! I see @bobbyD ttping and that gives me great hope! He helped me get set up over 3 years ago and I am a fan. :slight_smile:

I’m about to head to work so I might not be able to reply or try more suggestions for a few hours. Thank you, again!

Thank you for the idea, and your reply! I will try this and report back.

If this doesn’t solve the issue, how else might I track down the route (pun) cause? I will not unplug any more devices until I better understand how to identify the problem device(s) if that is the cause as I don’t want to make things worse.

Just spit balling here, but have you tried going back to a 2.3.2 platform version?

2 Likes

I will take any and all suggestions from you for sure!

I have not tried rolling back yet. When I noticed a few lights not responding, after a few days I decided to upgrade from 2.3.2 to 2.3.3. I did the upgrade Monday afternoon (2 days ago). When a few more lights needed to be power cycled is when I checked the Zigbee logs.

I will try the following today after work and report back, in order:

  1. Pull batteries from devices spamming the logs with the specific messages stated by Mike below.
  2. Power down and unplug the C-5 for 45 minutes. Then check logs and see if things have settled after a few hours.
  3. Roll back to 2.3.2 and then check logs after a few hours to see if things have settled.

Note: my initial post has steps 1 and 2 inverted. My first step after noticing a few lights not responding was to upgrade to 2.3.3, and then I began checking logs yesterday after more bulbs went offline. I’m sorry for causing confusion with inverting that order in my steps above.

Ok so this issue started whilst running 2.3.2 then?
If thats the case then theres really no value in rolling back to 2.3.2.
Some of the messages in the zigbee logs could be firmware update requests (cluster 19) from the devices, which is interesting in that many of these devices do not support online firmware updates.
Personally i would pull power/batteries on each and every device in the zigbee logs thats producing a profile 0x104, cluster 0x19 message. Continue to monitor the logs to verify things have quieting down, then power up these devices one at a time and see how it goes.

4 Likes

Noted, I will do that first on my list then, which I will edit above. This is extremely helpful. It will be a little bit but I will report back. Thank you!

1 Like

FWIW, the hub's neighbor table looks fine to me; most devices look like they have strong bidirectional paths and that's what you want to see. It's normal to see the age counters to change when you refresh the page; also not unusual to see a few neighbors that don't have strong enough links to be next hop routers. The neighbors themselves should not change very frequently, however; that usually will happen when something has disrupted the mesh.

As Mike noted the profile clusterID 0x19's in the logs look to be unusually frequent; I think the clusterID 0x6's (Match Descriptor Request ZDO command) may indicate that the device is looking for an OTA update server.

AFAIK the OTA server can set an 'OTA server discovery broadcast' frequency on the target device; not sure what the default would be in the absence of an explicit setting. On my C-3 running 2.3.2.141 I see a few of these in my logs from various devices but they aren't very frequent (usually several minutes apart) and not repetitive.

3 Likes

Alright, I'm about to start testing using your method. However, a few devices are mains powered. Using your logic of focusing on devices producing a profile 0x104, cluster 0x19 message.

Before getting started, I left the Zigbee logs open for an hour and took an inventory of the devices that matched that profile.

As I compiled this list, I did notice a pattern. I don't know if it's relevant. Each 'spam' of several messages for a single device with 'profileId:0x104, clusterId:0x19, sourceEndpoint:1, destinationEndpoint:1' was always preceded in the logs with a single message for the same item that always showed "profileId:0x0, clusterId:0x6, sourceEndpoint:0, destinationEndpoint:0"

Example:
Wyze Cam - Kitchen2022-09-28 13:36:18.767 profileId:0x0, clusterId:0x6, sourceEndpoint:0, destinationEndpoint:0 , groupId:0, lastHopLqi:255, lastHopRssi:-75
Wyze Cam - Kitchen2022-09-28 13:36:20.884 profileId:0x104, clusterId:0x19, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-76
Wyze Cam - Kitchen2022-09-28 13:36:23.806 profileId:0x104, clusterId:0x19, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-76
Wyze Cam - Kitchen2022-09-28 13:36:32.763 profileId:0x104, clusterId:0x19, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-76
Wyze Cam - Kitchen2022-09-28 13:36:29.743 profileId:0x104, clusterId:0x19, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-76
Wyze Cam - Kitchen2022-09-28 13:36:26.826 profileId:0x104, clusterId:0x19, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-76

Here's my inventory of items I'm going to start pulling batteries on - I'm curious if @mike.maxwell or @bobbyD have any suggestions on what to do with my mains powered devices as I don't want to make things worse. I'll report back after pulling batteries, either way.

Batteries have been removed from all devices with both a battery and a profile of 0x104 and 0x19. The issue did not stop, all of the mains powered items that were spamming profile 0x104, cluster 0x19 have remained and a few more showed up in the logs (new screenshot below of my devices)

Worth noting, with less devices it's easier to spot patterns in the logs.

Pattern:
There's always a profile: 0x0, clusterId:0x6 that precedes exactly 5 instances of spam with the same device showing a profile 0x104, cluster 0x19. (see screenshot)

So, it's not solved, but perhaps that pattern means something to you @mike.maxwell , or gives you more ideas for me to try?

Screenshot showing pattern:

Updated list of devices impacted:

WAF is dropping due to all the sensors being deactivated. It's been about 45 minutes and the logs are still flowing with (for lack of a better term) spam, just without the battery-powered devices. I'm going to put all the batteries back in and deal with having to manually turn on/off the random light every few hours.

Tomorrow I'll try powering down and unplugging the C-5 for 45 minutes, rather than the 10 minutes I tried yesterday. I'll keep reporting back, but if there are any more ideas I'm certainly open to trying things as I've not been able to resolve this yet.

Thanks!

EDIT: forgot to tag @bobbyD in case any of the info above is useful to you or anyone else as I keep trying things to resolve this.

Quick note: after putting the batteries back in, the devices resumed spamming the logs. The pattern articulated above, however, remained consistent even for the battery powered devices. Profile 0x0, cluster 0x6, followed by 5 rapid instances of profile 0x104, cluster 0x19

i am getting the exact same thing but my mesh is working .. not sure that is abnormal and if it is, may be related to the firmware updates?

ie

1 Like

I’m sad to report that this did not resolve the issue.

I powered off the hub for 60 minutes. When I turned the hub back on the spam resumed within 1 minute of rebooting.

Next step is researching how to roll back to 2.3.2 firmware to rule out 2.3.3 being a root cause.

1 Like

:disappointed: Not much research needed. Go to Diagnostic Tool and click Restore Previous Version, then select 2.3.2.141.

2 Likes