2.3.5.152 upgrade and virtually everything is broken

OK so I am going to focus on 1 device. P1 using KK driver. After about 10 attempts to re-add, here is what I got in the logs:

dev:9642023-08-09 07:30:04.550debugBasement Landing Motion sending ZigbeeCommands : []
dev:9642023-08-09 07:30:04.549warnBasement Landing Motion unknown device LUMI lumi.sensor_motion.aq2
dev:9642023-08-09 07:29:34.519warnBasement Landing Motion if no more logs, please pair the device again to HE!
dev:9642023-08-09 07:29:34.417infoBasement Landing Motion device model lumi.sensor_motion.aq2 manufacturer LUMI aqaraModel RTCGQ11LM deviceName was set to Xiaomi Motion Sensor RTCGQ11LM
dev:9642023-08-09 07:29:34.392infoBasement Landing Motion InitializeVars... fullInit = true (driver version 1.2.4 2023/01/26 7:32 PM)
dev:9642023-08-09 07:29:34.372infoBasement Landing Motion configure...(driver version 1.2.4 2023/01/26 7:32 PM)
dev:9642023-08-09 07:27:58.353debugBasement Landing Motion sending ZigbeeCommands : []
dev:9642023-08-09 07:27:58.352warnBasement Landing Motion unknown device LUMI lumi.sensor_motion.aq2
dev:9642023-08-09 07:27:40.848infoBasement Landing Motion (parse attr 5) device lumi.sensor_motion.aq2 button was pressed
dev:9642023-08-09 07:27:40.845debugBasement Landing Motion parse: Desc Map: [raw:855A01000034050042166C756D692E73656E736F725F6D6F74696F6E2E617132, dni:855A, endpoint:01, cluster:0000, size:34, attrId:0005, encoding:42, command:0A, value:lumi.sensor_motion.aq2, clusterInt:0, attrInt:5]

Note: Motion still doesn't work. Triggers don't seem to work. I am not even sure if the device is staying connected like previous. See pic.


The device doesn't appear to be paired.

So what next? Why isn't HE 152 keeping any and all devices paired including wall plugs, frient?
At least I know they can communicate at least briefly. The antennae isn't dead.

I would suggest to resolve the issues one by one, starting from from the repeating/routing devices first (the mains-powered Zigbee plugs). Then make sure all the non-Aqara battery devices are working as expected, and Aqara devices last.

Can you confirm that all the plugs and the other devices are online and responding to commands reliably ?

4 Likes

That's my point. Nothing is responding to commands. On and off for wall plugs via HE app do nothing. The friends as example haven't reported temp since yesterday morning. They are offline. Everything (Zigbee) is NOT responding. Only KASA and zwave seem to be reporting in/working.

For troubleshooting, I am using the Device Health Status app (you can install it from HPM). It shows the device online/offline status of all your Zigbee and Z-wabe davices, using all the different methods available.

Here is an example from my C-8 hub, filtered for Aqara devices only :

Device Health Check

I installed the app but dont' see it anywhere. I checked your supporting page and do not see startup. Maybe I am just tired..

After adding the code from HPM, you need to click on the "Add User App" button:

Add User App

This is the main issue at the moment.
Have you tried to delete one of the DOGAIN Zigbee Smart Plugs and then pair it again to HE hub as a new device? ( go to the plug web page and click on the 'Remove Devce' red button at the bottom of the page).

2 Likes

It paired and working so far.

OK So you may be on to something..The graph is starting to populated with devices upstairs. FYI. They are still on OLL. But, they are populating. I had removed the 1 P1 and tried to re-add. Wont' work so far. It sits on "found" and does not come up to allow naming.

1 Like

If you haven't deleted it from the database, since it's zigbee it will slot back into it's old place.

1 Like

We are still troubleshooting the first group of devices - the mains-powered routers.
Please add all of these to the app, and test whether they are responding to on/off commands.
You must have the routers first working rock-solid, before we go to the non-Aqara devices and last to Aqara devices (including P1).

Here I can propose you to use this driver temporily for testing the Zigbee plugs connectivity. When you are sure that all plugs have stable Zigbee connectivity to the C_8 hub, you can revert back to using the stock drivers.

You can use the Ping button to measure the round-trip-time - this is the time in miliseconds between sending a simple ping command and receiving the response.

Ping button

Normally, the RTT should not vary by more than a few hundred milliseconds. If you look at the live logs, you should see results like this :

RTT

You can set up a simple RM5 rule to test the connectivity of all your plugs once every minute or even every 15 seconds, like this:

RM5 ping rule

You can leave the pings running for hours - a healthy Zigbee network should not be affected in any way by this activity.

Update: the driver was updated to count the number of successful and the number of failed pings ( version 2.1.3 2023/08/09 10:47 PM)

Failed Pings count

3 Likes

Have in mind, that both the Zigbee graph and the Device Health Status app show that something is received from the device. This doesn’t necessarily mean that the devices receive any commands from the hub. So far the problem in your case seems to be a C-8 transmission issue.

2 Likes

OK Feedback.
I don't know how to add the ping rule but will take a look shortly.
The network is continuing to build. I didn't see your msg and had re-added all the plugs until they showed up.
The latency is still fairly bad and rules don't seem to work still.
Attached are screen shots of the ping test on the first plug with the driver you suggested.


1 Like

RTT deviation is too big..

Have you tried changing the Zigbee channel?

Sometimes a new WiFi interference source appears.

Move one of the plugs close to the hub. Will the RTT timings be much better?

2 Likes

Honestly @goldbond1 - this is my suspicion. I know you've had your Deco routers for a while, but my experience with them is that:

  1. They can "self-optimize" and change the 2.4GHz channel used (although I believe this has been disabled more recently).
  2. Since they use a 40MHz wide transmission at 2.4GHz, the side-lobes can interfere with any zigbee channel (from 11 to 26).
  3. Deco firmware updates change the routers transmission pattern.
4 Likes

I have tried multiple channels and power based on earlier suggestions.

A real test would be to rollback the firmware to pre 152. This would rule out any coding updates. Can this be done without blowing up my db?

If that doesn't work and I still see the latency and issues, then sure I'll bite on the XE75 suddenly destroying my network.

Yes. Do it from the Diagnostic Tool (port 8081).

1 Like

Restored back to 2.3.4.148and used exact same db. Here is the ping results...
ping3

I hammered the ping test. Not sure if I should have. Here are the results. There is no diagram to look at in this version. I 'll start seeing responsiveness of components and see if any rules fire.

I restored my db from Jun 146 backup and hit the reset button on the P1 once and it added. Wasn't adding at all with 152. See below.

The P1 is working. It's recognized. It hasn't dropped off. It's listed as present for the first time.

I will check whether rules are firing next..

Routines are working. Back to the original problem which is delay on KASA lights turning on of 2-5 seconds after trigger...

2 things wrong here:

    1. If I can restore back, drop the pings, add device that wouldn't add, get my network at least working again...then something is indeed wrong with 152. (for me at least)
  1. Some of the pings are erratic. I still have delays on KASA. I will be replacing the XE75s with OMADA again which has tighter controls on wifi signal/channels.

Does support or dev have any comment here? I just want to know this is being investigated as a problem with 152.

1 Like

What was the time period difference between reverting back to HE platform version 2.3.4.148 w/ your last database and then restoring an old database from June?

Often, when there is any kind of problem that affects the hub performance- seems like the Zigbee interface is affected first?

1 Like

I am not sure what you are looking for. Are you saying I am ok now but it will start to crawl within a few hours, days? If that were the case, wouldn't have my restores of 152 corrected something rather than break my network to the point no Zigbees were recognized?

I am about 1.5 hours in and things are still working so far. However, I do see at least 6 devices that are past 3 hours apparently for check in that I will look at shortly.

One clarification/confirmation: Do I still need to replace the OLL drivers?