2.3.5.152 upgrade and virtually everything is broken

I'll be honest. While the upgrade may have triggered the initially fiasco somehow, The work by jtp10181, rlithgow1, aaiyar, and kkossev found a pile of problems holding me back. These included but not limited to:

  • TPLINK XE75 decided to change channels completely hammering my Zigbee. It's gone. Went to OMADA again where I can completely change the channel selection and width. I now have Zigbee owning Channel 11 (25 on the HE) while my 2x APs are 1 and 6.

  • I found my VIZIO sound bar was broadcasting a massive wifi signal in my office right next door. It spanned even my Zigbee Couldn't control it. It's gone.

  • Backup/repair multiple times trying multiple versions. I really don't think the build is the major issue. I feel it was a culmination of things that perhaps the upgrade exposed. There were comments that it contained alot of Zigbee fixes.

  • To your point, and thanks to aaiyar/KK, I had 1 SONOFF that I removed and re-added with KK driver and my network started to build again. It was the weirdest thing. Prior to I spent hours with a broken network. This was really the turning point on recovery...

  • It did take me FOREVER to repair several Zigbee. I do not know why this has become so difficult after multiple houses and years post upgrade. I have a combination of DOGAIN and SONOFF that were all well positioned and working prior to upgrade. I suspect the XE75 rescanning likely made the entire mesh puke.

  • I purchased new DOGAIN to replace all SONOFF as there was some question on reliability. NONE would pair with HE. I gave up, sent them back, and went back to SONOFF which worked before anyway..

Current state:

  • I eventually got to a point where 95% Zigbee are now connected. This took some 10+ tries on the Aqara. A handful still will not connect less than 10 ft from a repeater (plug) I don't understand why as these have been reliable as mentioned for years. There is no network interference on any channel that wifianalyzer can see.

  • I am still using OLL drivers. Not because I want to..because switching is a complete b****h to do. It was hard enough repairing without trying to get from OLL to KK's drivers. I gave up. I'll investigate again in future.

Summary
Don't give up and do not rule out ANYTHING that may have changed...even your neighbor spamming your network channel.

1 Like

Thanks for posting, I've been holding off upgrading my C7. Do you have any idea if maybe your Zigbee channel changed on it's own or if you Zigbee PAN ID was changed?

The ZigBee channels auto change on the XE75. And there's no way to change it back to anything you want. Once that happened that's when I restarted running into serious issues.

I meant the Zigbee channels on the HE. My experience, echoed by others also, is that on occasion the HE will magically change the Zigbee channel when Zigbee crashes or is stressed. Also the PAN ID of the HE can also change. It's a good idea to make a note of both Zigbee network details.

I understand the Deco XE75 will auto select the Wifi channel, some other routers do that also

2 Likes

just to add to this thread because this is EXACTLY what was happening to my C7. I was running fine on 2.3.5.138 for a while and then all of a sudden, things started acting strange, so I thought my HE was bogged down, delaying messages, etc, it needed a reboot.

Name Description Value Event type Date
systemStart System startup with build: 2.3.5.152 2.3.5.152 2023-08-19 07:16:04.560 PM EDT
systemStart System startup with build: 2.3.5.152 2.3.5.152 2023-08-19 12:57:52.599 PM EDT
systemStart System startup with build: 2.3.5.152 2.3.5.152 2023-08-19 10:06:44.892 AM EDT
systemStart System startup with build: 2.3.5.152 2.3.5.152 2023-08-18 05:10:20.866 PM EDT
update Updated with build: 2.3.5.152 2.3.5.152 2023-08-18 05:08:12.459 PM EDT
systemStart System startup with build: 2.3.5.138 2.3.5.138 2023-08-18 03:17:38.554 PM EDT
systemStart System startup with build: 2.3.5.138 2.3.5.138 2023-08-17 09:16:05.922 PM EDT

After the first reboot, odd things started happening, 2x Sonoff Motion Sensors using OLL drivers and 1x Hue Motion Sensor stopped reporting to the hub. So I thought I'd upgrade to latest to try to resolve, but once upgraded, things got worse.

After hours of troubleshooting, changing up drivers, repairing the Zigbee mesh, repairing some of sensors which would work for a few moments and then drop off again. Basically running up and down stairs testing sensors, I finally found this thread and thought it was the latest firmware was the problem.

Everything seemed similiar to what I was experiecing (OLL drivers, zigbee mesh dropping) and then realized that I ALSO have Deco X60s mesh which I did a Network Optimization before the time my first reboot which started causing issues. Then it all started clicking that my Zigbee issues was having problems with Wifi inteference.

My 2.4ghz was on Channel 9, while my 5ghz was on Channel 44 and my Zigbee Channel was 20 (which was right beside 9). So I changed my Zigbee channel to 25, rebooted and had the Zigbee Network rebuild, I had to repair the Sonoff Sensors, Buttons (which thankfully was only 3x sensors, 1x button). Everything started working and was responsive again.

If anyone else is experiencing this problem and has TP-Link Deco mesh, take a look at changing your ZigBee channel. I'm going to be moving away from Deco system in the future to something that will allow me to choose Wifi channels as it doesnt allow such a basic functionality.

I hope this helps others instead of pulling out your hair for hours trying to figure out what went wrong.

2 Likes

I'll be honest, I'm not 100%. All I know is that after the upgrade things started bogging down and then I had to do some detective work. Giving you and the person below seem to have similar symptoms, now I'm questioning things.

one thing for certain is that we all have Deco Mesh that is common... and I've always knew that its super dumbed down networking, but didnt realize until AFTER I bought it, that its REAL dumb lol.

No control over your Wifi Channels is the last straw for me. I'm going to look into piecing together an on sale Asus AI Mesh conponents or maybe Unifi.

I used to have an Asus AI Mesh system, using a pair of RT-AC86U routers, with an Ethernet backhaul. I really liked the Asus routers, but I still had to occasionally reboot them to get everything working smoothly again on my LAN. I also really enjoyed using Merlin's firmware early on for Asus routers.

I switched over to a full Ubiquiti UniFi network, and have been extremely happy ever since. Nobody ever asks me to troubleshooting the home network anymore. It just works. (UDM SE + 4 Access Points + PoE Switches + 5 Protect Cameras)

1 Like

Picking a zigbee channel that is isolated from WiFi networks is only part of the solution. The other part is to have really good zigbee repeaters, and a sufficiently large number of them. The WiFi spectrum around my house is incredibly noisy (see below). Yet, I have two large zigbee networks that have been stable for years. One of them is on Channel 20 and the other is on Channel 25.

On the Channel 20 mesh, I have 2 repeaters for every end device. This mesh has ~40 devices. The Channel 25 mesh has 82 devices, with one repeater for every 3.5 devices. So my experience suggests that good repeaters, and plenty of them, are critical to maintain a stable zigbee mesh even in the face of a heavily polluted WiFi spectrum.

3 Likes

If you look at the business case, TP omada is almost the same price or cheaper for a couple APs than it is buying their expensive XE 75s. That's why I switch back. And they're completely manageable. I have two EAP 670s on order. I'm going to replace the 653s too weak. In my last house I had the EAP 245 and they were fantastic. Should never have switched the lease. Like you said, they're dumb down too much.

1 Like

I dont have a large number of Zigbee, I figured Id have enough repeaters for the number of sensors I had throughout the house to prevent what happened to me (Wifi interference).

BTW HE team the Zigbee Network Graph (Beta) is a great feature that I just found during this troubleshoot...! Very helpful as I've never had a good visual on what was connected to what.

1 Like

That map is useful - but it is important to bear in mind that it represents both current and historical data.

And yes, you definitely have a good number of repeaters. The ones I have found work best (in my environment) are TuYa USB zigbee repeaters. And Sonoff Dongle-Ps that I have flashed to function as routers (they come flashed to work as zigbee coordinators).

2 Likes

I am aware of the wealth of opinions on Aqara/HE interop, but since I fixed this single specific issue (many months ago), I have not had a single of my many Aqara devices drop off (on C7).

1 Like

Most of this discussion is over my head.

My problems with my HE zigbee devices started when I upgraded from xx146 to xx152. I believe the source of the issue was totally because the plugged in powered devices would not reconnect to the zigbee network until I unplugged them and plugged them back in. I'm a bit concerned about what this means for future unintended power cycling when the power grid goes down, especially if I am not home to manually power cycle my zigbee devices. This might be what causes me to get a UPS for the HE.

My zigbee uses channel 20.
My 2.4 ghz wifi is on channel 7 (I think, not sure)

I'm using a newer and an older Asus router to create a local wifi mesh with them, with wired backhaul (which sounds like I know what I'm talking about but I learned about that after hours of reading!). With this setup my wifi is the best it has ever been.

So, wanted to update on this thread and raise 1 remaining questions..(at least for now..) of a remaining issue I have. FIrst a summary.

Replaced my TP-Link E-75 channel changing PITA with 2x TP-LINK OMADA 670 and couldn't be happier. Not only is the channel control, strength, and monitoring fantastic. I went back to ceiling mounts like I had at the last house and nice having less wires. 2.4Ghz is strong and no overloads reported. I have also split my 2.4 and 5Ghz signals and moved mobile and tv to 5Ghz leaving 2.4 al most exclusively for IoT.
With control over my channels, HE is now rock solid. I continue to have 1 or 2 of the 49 Aqara devices that drop periodically. However, I am going to add 2x wall plug extenders to try and boost. Recommendation after following some of the help from folks is make sure you have 1 of these in almost every room to have a rock solid Zigbee mesh. BTW, I tried the new DOGAIN and could not pair at all. Gave up and sent back. Going to order the SONOFF S31 to see how they do.
Motion, temp, and humidity is working perfectly between Aqara devices and the KASA wifi HS200. HS210 and HS220. But, here is my remaining big problems..

Based on reading(s), I thought you could use the KASA locally ONLY and avoid outbound traffic to cloud. I have the KASA working perfectly on HE and blocked outbound based on IP in PFSENSE. The problem is that every few days, I start getting the KASA's flashing attempting to connect to I assume the cloud. I am talking all of them. I disable the outbound rule and it seems to go away. When this flashing happens, rules stop working as brief disruption to connectivity.
Am I missing something here?

2 Likes

This might be one @djgutheinz has some experience with? I can't say I have completely disabled external comm's on my Kasa devices, but have certainly used the local-only binding option. So can't say whether they have needed / wanted to still talk to the mothership.

I think you mean 2.4.

3 Likes

I hope so... another frequency would not be helpful....

corrected!

1 Like

I have turned off:

  • Use Kasa Cloud for device control
  • Kasa Cloud Binding
  • Polling left at 30 minutes.

Am I missing an option on the KASA devices?