Pros and Cons of Xiaomi Zigbee Devices

veeceeoh · January 30, 2019, 9:54pm

Yes, if the end device aging timeout was increased to a value that equates to over 60 minutes, then it would allow Xiaomi / Aqara devices to check in before the parent "forgets" about it.

However, I'm not sure that's what was changed in SmartThings' stack which led to Xiaomi / Aqara devices working again. Again, from that post by Tom Manley:

The end device aging timeout has not changed between the 0.17 and the 0.18 release. However we did upgrade the zigbee stack between these two releases. The zigbee stack is provided by the radio vendor and they frequently update it to introduce new features and fix bugs. I suspect that the cause for the change in behavior between 0.17 and 0.18 is due to a bug fix to make the stack compliant with the zigbee specification. The 0.17 firmware did not send the leave and rejoin request after the end device timeout which I believe was likely a bug in that version of the stack. Just to be sure I am double checking with the radio vendor on this.

This information is from a number of firmware revisions ago, so I don't know what happened since then, but it's possible that subsequent versions of SmartThings' version 2 hub firmware use a stack that does not send the leave & rejoin request.

As for what Hubitat did in this case, either increase the end device aging timeout value -or- disable the leave & rejoin request sequence, I have no idea. I think only @mike.maxwell would be able to answer that.

Regardless of what the coordinator's end device aging timeout value is set to, I don't believe that value propagates to all routing (repeater) devices, though. So that's probably one of the reasons why the Xiaomi / Aqara issue still exists for people with "incompatible" repeater devices.

Tony · January 30, 2019, 10:02pm

The ST 'fix' to acomodate Xiaomi did not propagate to routing devices; I had to take pains to keep my Xiaomi buttons to keep from joining through Iris plugs. As child devices of the ST hub, they usually worked fine until their was a power failure lasting longer than an hour.

Somel · January 30, 2019, 10:04pm

Interesting.
Anyone know the Ikea outlet value?

Somel · January 30, 2019, 10:26pm

I meant the check in value

bertabcd1234 · January 30, 2019, 10:48pm

Keep in mind that this "timeout" on the hub or repeater isn't the only issue. Someone correct me if I'm wrong, but if the device battery dies or the hub happens to be offline (e.g., for a firmware update) at the exact wrong moment, they could still "fall off" since they don't properly ask to rejoin--or at least that's my understanding of these facts as a non-ZigBee-expert.

Just wanted to say this since there seems to be a lot of focus on just that one issue, whereas there seem to be at least two at play. (Of course, in the case of the battery, you'll have to touch the device anyway, so verifying whether it responds or if you need to rejoin it is minimal extra effort--just not typical behavior for ZHA).

Somel · January 30, 2019, 10:59pm

It depends.
If the check in is for example 28810 as it allows for 2 check in attempts will Void that issue.
This is theory only.

martyn · January 30, 2019, 11:10pm

I believe it depends on the parent, not the end device i.e. if you pulled the battery on an end device and left it out for a week, it would still connect back to the parent as long as the parent hadn't removed it from its child list.

From what I've seen so far you can tell if a device has been forced to leave and re-join because its 16-bit address changes. I haven't read enough of the ZigBee spec to be sure that's always the case (maybe @tony or @veeceeoh know) , but it seems to be consistent behavior on my system.

Somel · January 30, 2019, 11:13pm

Silly question probably but why would the coordinator ask a child to leave and rejoin? What are the advantages?

martyn · January 30, 2019, 11:22pm

It's not a silly question, we're all learning here

I believe it's the parent that tells it to leave and rejoin (parent being router or coordinator) if it's been expired from the parent's list of children due to timeout.

I'm not sure if it does that as a routine or only if all the other child slots on that parent are now taken in the meantime.

I suspect the former because from my own scanning / mapping with XBEE I do see occasional leave and rejoins, even though none of the router devices seem to have an overload of children.

csteele · January 30, 2019, 11:23pm

The command is a bunch of bits, but it's been named "leave and rejoin" because that's what the end device is going to do. Could it have been called the "rejoin" command, yes, but that would probably overlap with the 'rejoin' that the end device is going to be doing... reduces the confusion I think.

end-device: hi - checking in
coordinator: I don't know you. use the rejoin process if you want to continue
end-device: rejoin
coordinator: welcome, long time no see.

That's an artificial conversation, by the way

vjv · January 30, 2019, 11:26pm

For some reason I thought it was my Prime subscription...

Somel · January 30, 2019, 11:40pm

... Why don't you know me? You as*****

csteele · January 30, 2019, 11:51pm

Apparently that's exactly what Xiaomi's do.. they curse at the Coordinator and walk away (don't join)

cwwilson08 · January 31, 2019, 12:14am

Should have been explained like this a long time ago...

veeceeoh · January 31, 2019, 12:41am

I'm not aware of any method to query a router to find out its default end device aging timeout value. Maybe ask IKEA?

I did a quick Google search on "zigbee end device aging timeout", and the first hit (from SiLabs' Zigbee & Thread Knowledgebase seems to confirm that end devices can inform their parent of the end device's checkin timeout requirements:

The End Device Timeout Request command is sent by an end device to informing its parent of its timeout requirements when joining/rejoining to the network.This allows the parent the ability to delete the child entry from the child table if the child has not communicated with the parent in end device poll timeout.

Keep in mind this may only apply to SiLabs' EmberZNet stack, based on these paragraphs from an Ember Developer Guide PDF document:

The ZigBee protocol does not offer a standard way to timeout entries in a child table. In place of this, several heuristic mechanisms exist for aging entries in a child table. For instance, if a parent hears a device that it thinks is its child interacting with another parent or being represented by another parent, it may remove the entry from its child table. Silicon Labs has developed a more deterministic mechanism for child aging called the “End Device Poll Timeout.” An Ember parent expects that children will “check in” with their parents within the end device poll timeout. If they do not, it assumes that they have gone away and removes them from its child tables. The End Device Poll Timeout is defined in stack/include/ember- configuration-defaults.h

The end device does not get to configure the end device poll timeout on its parent and there is no agreed upon protocol for communicating the End Device Poll Timeout value between parent and child. In place of this, Silicon Labs has a configured an assumed end device poll timeout on both parent and child. This value is defined in stack/include/ember-configuration-defaults.h.

Depending on its sleep characteristics, battery life considerations, the child may wish to sleep past the assumed end device poll timeout. It is free to do this. However, if it does, it must repair the network connection with its parent before interacting with the network again. Generally a device that is likely to do this should check the state of the network when it wakes up to see if any repair is necessary before sending data. A sleepy device should never wake and assume that its parent is still there, unless it knows for certain that its parent is configured with a mutually agreed upon End Device Poll Timeout that it is obeying. For more information on the end device poll timeout on Ember devices see the configuration header file located at stack/include/ember-configuration-defaults.

The dead battery scenario would be a good one to test out. But if a device is dropped off the mesh, you can get it back by performing a manual re-join by going through the same steps of pairing, and as long as the hub has it on its device list with the Zigbee Id, the rejoining device will be matched to that device entry.

I couldn't find any direct answer to that, but it may have to do the child device needing to repair the network connection with its parent, as is explained in the Ember Developer Guide document I quoted above.

A more likely explanation is the one I found in Texas Instruments' - What's New in ZigBee 3.0 white paper document:

But once the timeout value is exceeded and the child is aged out, the parent will send the device a Leave Request with the rejoin attribute set so that the device may be allowed to rejoin the network through a new parent device.

So perhaps the assumption is that a child has aged out because it either moved to another parent or should be given the opportunity to a different parent with better signal strength?

Other thoughts:

@iharyadi has been for a while developing a multi-sensor ZigBee router device which is also "friendly" with Xiaomi / Aqara devices. He may know some useful information regarding end device aging and the "leave and rejoin" request that parents send to aged out end devices (child devices).

A good article from Qorvo: Demystifying Polling Control in Zigbee Networks (Note: Probably Zigbee 3.0 specific)

Tony · January 31, 2019, 1:54am

Device check in intervals were part of the HA 1.2 profile (from my post a couple of days ago in this thread):

For the HA 1.2 profile, this is addressed in section 6.4 here: NXP Zigbee Home Automation User Guide

Specifically see section 6.4.3. That's a key part of making the Home Automation Profile stuff interoperable (and where Xiaomi drops the ball; for reasons known best to them),

Tony · January 31, 2019, 2:01am

The Base Device Behavior spec does specify persistent data (stuff that does not always get re-discovered when the device powers up, such as whether the device was previously joined to a network and what channel it was using). See 'Zigbee Persistent Data' sec. 6.9 in Zigbee Base Device Behavior Specification

veeceeoh · January 31, 2019, 2:25am

Yes, and those poll control interval and timeout parameters would normally be directly accessible via cluster 0x0020, but Xiaomi / Aqara ignore write attribute commands to that cluster (and pretty much any other.)

Somel · January 31, 2019, 2:32am

You meant you should be able to write to them or read the information they have?

veeceeoh · January 31, 2019, 5:50am

Both. But I was focusing on the normal ability to change the poll control interval and timeout values. If that was accessible on Xiaomi / Aqara devices, it could potentially clear up most if not all of their issues.