2.3.5.152 upgrade and virtually everything is broken

C8, 21x KASA HS200, 210, 220's, an assortment of ~40x Aqara Zigbee devices, 6x Zwave devices..
I use the HE KASA app and I use oh-lalabs.com Aqara drivers.

Prior to upgrade, I had zero issues with the exception of 1x Aqara dropping off occasionally. Routines fired immediately including Aqara motion turning on KASA lights (and back off after x minutes)

Since 2.3.5.152 have had nothing but problems. The problem started with latency on the motion/light activity. It was slow enough to be useless/annoying. 3-5 seconds typically. After posting and reading some forums, I performed firmware updates on the KASA, no help. I thought there may be an issue with KASA and the XE75's that didn't have an issue before. I turned off cloud sync and it was suggested I go to static IP on the KASA...ok I still need to get that done.

But, now after careful log review, I am seeing virtually everything dropping off the mesh. Examples:
dev:9522023-08-08 10:56:18.031warnNo event seen from the device for over 3 hours! Something is not right... (consecutive events: 9)
(xxxx)2023-08-08 10:58:58.034[warn] Event interval INCORRECT, recovery mode (Normal) ACTIVE! If this is shown every hour for the same device and doesn't go away after three times, the device has probably fallen off and require a quick press of the reset button or possibly even re-pairing. It MAY also return within 24 hours, so patience MIGHT pay off.

I have tried rebuild network and even tried moving my HE to a better location. Nothing is helping.

This isn't just a KASA issue. This is a virtually everything is broken now issue.

I don't want to spend another 3 days troubleshooting and pressing device reset buttons all over the house. Hoping someone has an idea of root cause and what I should do.

What is your polling rates set to on the Kasa devices? If your Kasa devices are having connection issues and the app keeps polling them it may slowly degrade the hub performance over time due to continued failed IP connections. So the issue with the Kasa devices may be causing the other issues.

You definitely want to get all the Kasa devices set with a reserved IP in DHCP. I would log out of your Kasa cloud account in the HE app so it cannot use that at all (unless you need it for any devices), which will force all comms to be local. Then you may need to run the discovery so it can find all the devices and update the IP addresses for them all.

Have you also tried to do a full shut down of the hub, remove power for 10+ seconds and restart? Sometimes that does more good than just a reboot.

  • Virtually everything is default. Polling is 30 minutes (Default)
  • Zigbee Power/Channel are 16 (Carry over from C7 restore I suspect)
  • All KASA are local now. I disabled all cloud. I still need to do the static IP.
  • I have shutdown and unplugged hub multiple times.

The Zigbee's continue to drop parodically. As of now 1/3 have zero msgs.

I haven't even checked the ZWAVE...(nor do I want to right now...)

I continue to have delays in KASA and repeat Zigbee devices going to lost per error messages. I can't help but think something completely broke my hub.

C7 power was 8, the C8 defaulted to 16 originally but they backed it down to 8 I think after people had issues. I would try setting it to 8 and see if that helps. When you increase power you also increase susceptibility to interference.

For the channel, you will want to check your Wifi 2.4Ghz channel and then set the Zigbee channel to something away from it. Just be careful with Zigbee 25/26 as not all devices like those channels I have heard. I personally use 20 myself.

Also if you have not tried it recently I would go to Setting > Backup and do a backup / download it locally. Then right away restore that backup from that same screen (which will reboot the hub). This will clean the database of any possibly corruption. Seems silly but I recently had a memory leak problem on my dev C7 and this trick fixed it.

I'll try channel 8 on the HE. My TPLINK XE 75s are operating in AP mode and 2.4 GHz = channel 2, 5 = 40, and 6 = 37 respectively

I was waiting for someone to suggest backup/restore. Given I can't seem to keep Zigbee's connected, I don't see alot of options....

For what it is worth, I haven't complained about a failure since 22.7.5 (which I was correct on)

I think you are confusing the channel with the power setting. The power setting I suggested to set to 8. Zigbee does not have a channel 8.

What does 2,5=40 and 6=37 mean? Is 40/37 the 5Ghz channels?

Assuming you are using 2.4Ghz wifi channels 2, 5 and 6, I would set your Zigbee CHANNEL to 20.

2 Likes

Let me clarify. I did adjust power to 8. I was making the point that there should be no or limited channel interference based on what I have set on the HE and my XE75's.

I just performed the backup/restore and I have almost half my Aqara's with zero msg and last msg as N/A. This is really ridiculous..

I am going back to 2.3.5.121 to see if more stable. (~4/6/23)

All of these Zigbee devices worked fine on the C8 prior to the .152 update? Just asking because there are a few cases of issues with Zigbee on the C8 when things works fine on a C7.

Also if they had already dropped off before the power change you may need to pair them again and hopefully they will stick now. I only have a few zigbee devices and they work fine so I don't have much to add for Zigbee issues.

I have been running .152 since it came out, I do have Kasa devices (only a few) and Zigbee, have not had issues with either of them.

1 Like

I had zero problems prior to 152 as I stated. I had 1 device (Aqara Moisture) that went off periodically. Everything including triggers/alerts/apps worked flawlessly. The KASA latency has been going on a few weeks, I just noticed logs with devices going offline when I saw my temp readings in HA connection were all flatlined. Not sure what caused Aqara to completely break.

1 Like

Use 1-6 on your wifi and stay in the 20 not 40 mhz range and 20 and up for zigbee. Set zigbee power for 8. After you do this shutdown the hub and unplug. Wait 30 mins and power back up. This will throw all your zigbee stuff into panic mode. Bring hub back up and wait for things to settle.

Nothing has touched the zigbee stack in a while.

Ill give that a go. FYI. I have been 121 for what 20 min? Only 1 warning error. And, it is on a device I relocated and renamed in April.

1 Like

And remember you can always roll back

@goldbond1

I noticed you had TP-Link Deco WiFi access points. Be warned - they default to 40 MHz wide channels at 2.4GHz, and I don’t think it can be changed.

Not only that, Deco WiFi routers periodically “optimize” the WiFi channel used. I’ll bet money on your upgrade to 152 being entirely coincidental to a Deco self-optimization.

1 Like

The channels cannot be changed on XE75. Huge disadvantage to my OMADA in last house. That said, I am not quite sure where you are leading as again this worked fine previously. I listed the current XE75 channels and they don't come close to HE's? (you can check channel list in the app following optimization)

Update restoring to 121: FYI NO warnings. Nothing has dropped except the 1 moisture I cannot find. I am going to start checking apps firing next.

Update: I decided to restore back to 152: spam of warnings and devices knocked off started again.

Maybe there is something in the drivers you are using that is not playing nice with changes made between .122 and .152? That's what it seems like is going on. The warning from the driver may be unwarranted and the spam of logging may be what is impacting the hub performance and causing the connection issues. Just a thought.

2 Likes

The only driver changes I made were in or around April as I unified to oh-la-la. Things were fine back then even with the other drivers. I added a few more Aqara since. That's it. The warnings are legitimate. My sensors lose connectivity and report nothing.

I'm as baffled as you are. I think we can come to consensus that KASA latency may be related to static IP and I will look at fixing that as soon as I stabilize the Aqara's (hopefully)

Edit: Confirmed 100% these are getting knocked back off after re-adding. State goes back to not present.

Why? These drivers are ridiculously complex, and AFAICT, not supported anymore.

I stopped using the OLL drivers several years ago when I noticed that using them with over ~40 devices made my zigbee mesh very unstable. There were a few others here with similar observations.

4 Likes

Probably becuase you are using outdated and poorly constructed drivers from

here.

I would encourage you to try other drivers to start with.

3 Likes

It would help alot if you could suggest which of the several options (drivers) actually work. I found others had issues with temp readings. I find it interesting that they worked fine for months, now they are bad.

Kind of hard to do without any indication of the exact model numbers of the devices you have.

If you have old (legacy) Mijia and Aqara devices, I would recommend @veeceeoh's drivers. There are also drivers from @chirpy and @birdslikewires for these devices.

If you have newer (zigbee 3.0) Aqara devices, I would strongly recommend @kkossev's drivers.

2 Likes