Yes, if the end device aging timeout was increased to a value that equates to over 60 minutes, then it would allow Xiaomi / Aqara devices to check in before the parent "forgets" about it.
However, I'm not sure that's what was changed in SmartThings' stack which led to Xiaomi / Aqara devices working again. Again, from that post by Tom Manley:
The end device aging timeout has not changed between the 0.17 and the 0.18 release. However we did upgrade the zigbee stack between these two releases. The zigbee stack is provided by the radio vendor and they frequently update it to introduce new features and fix bugs. I suspect that the cause for the change in behavior between 0.17 and 0.18 is due to a bug fix to make the stack compliant with the zigbee specification. The 0.17 firmware did not send the leave and rejoin request after the end device timeout which I believe was likely a bug in that version of the stack. Just to be sure I am double checking with the radio vendor on this.
This information is from a number of firmware revisions ago, so I don't know what happened since then, but it's possible that subsequent versions of SmartThings' version 2 hub firmware use a stack that does not send the leave & rejoin request.
As for what Hubitat did in this case, either increase the end device aging timeout value -or- disable the leave & rejoin request sequence, I have no idea. I think only @mike.maxwell would be able to answer that.
Regardless of what the coordinator's end device aging timeout value is set to, I don't believe that value propagates to all routing (repeater) devices, though. So that's probably one of the reasons why the Xiaomi / Aqara issue still exists for people with "incompatible" repeater devices.
The ST 'fix' to acomodate Xiaomi did not propagate to routing devices; I had to take pains to keep my Xiaomi buttons to keep from joining through Iris plugs. As child devices of the ST hub, they usually worked fine until their was a power failure lasting longer than an hour.
Keep in mind that this "timeout" on the hub or repeater isn't the only issue. Someone correct me if I'm wrong, but if the device battery dies or the hub happens to be offline (e.g., for a firmware update) at the exact wrong moment, they could still "fall off" since they don't properly ask to rejoin--or at least that's my understanding of these facts as a non-ZigBee-expert.
Just wanted to say this since there seems to be a lot of focus on just that one issue, whereas there seem to be at least two at play. (Of course, in the case of the battery, you'll have to touch the device anyway, so verifying whether it responds or if you need to rejoin it is minimal extra effort--just not typical behavior for ZHA).
I believe it depends on the parent, not the end device i.e. if you pulled the battery on an end device and left it out for a week, it would still connect back to the parent as long as the parent hadn't removed it from its child list.
From what I've seen so far you can tell if a device has been forced to leave and re-join because its 16-bit address changes. I haven't read enough of the ZigBee spec to be sure that's always the case (maybe @tony or @veeceeoh know) , but it seems to be consistent behavior on my system.
The command is a bunch of bits, but it's been named "leave and rejoin" because that's what the end device is going to do. Could it have been called the "rejoin" command, yes, but that would probably overlap with the 'rejoin' that the end device is going to be doing... reduces the confusion I think.
end-device: hi - checking in
coordinator: I don't know you. use the rejoin process if you want to continue
coordinator: welcome, long time no see.
The End Device Timeout Request command is sent by an end device to informing its parent of its timeout requirements when joining/rejoining to the network.This allows the parent the ability to delete the child entry from the child table if the child has not communicated with the parent in end device poll timeout.
The ZigBee protocol does not offer a standard way to timeout entries in a child table. In place of this, several heuristic mechanisms exist for aging entries in a child table. For instance, if a parent hears a device that it thinks is its child interacting with another parent or being represented by another parent, it may remove the entry from its child table. Silicon Labs has developed a more deterministic mechanism for child aging called the “End Device Poll Timeout.” An Ember parent expects that children will “check in” with their parents within the end device poll timeout. If they do not, it assumes that they have gone away and removes them from its child tables. The End Device Poll Timeout is defined in stack/include/ember- configuration-defaults.h
The end device does not get to configure the end device poll timeout on its parent and there is no agreed upon protocol for communicating the End Device Poll Timeout value between parent and child. In place of this, Silicon Labs has a configured an assumed end device poll timeout on both parent and child. This value is defined in stack/include/ember-configuration-defaults.h.
Depending on its sleep characteristics, battery life considerations, the child may wish to sleep past the assumed end device poll timeout. It is free to do this. However, if it does, it must repair the network connection with its parent before interacting with the network again. Generally a device that is likely to do this should check the state of the network when it wakes up to see if any repair is necessary before sending data. A sleepy device should never wake and assume that its parent is still there, unless it knows for certain that its parent is configured with a mutually agreed upon End Device Poll Timeout that it is obeying. For more information on the end device poll timeout on Ember devices see the configuration header file located at stack/include/ember-configuration-defaults.
The dead battery scenario would be a good one to test out. But if a device is dropped off the mesh, you can get it back by performing a manual re-join by going through the same steps of pairing, and as long as the hub has it on its device list with the Zigbee Id, the rejoining device will be matched to that device entry.
I couldn't find any direct answer to that, but it may have to do the child device needing to repair the network connection with its parent, as is explained in the Ember Developer Guide document I quoted above.
But once the timeout value is exceeded and the child is aged out, the parent will send the device a Leave Request with the rejoin attribute set so that the device may be allowed to rejoin the network through a new parent device.
So perhaps the assumption is that a child has aged out because it either moved to another parent or should be given the opportunity to a different parent with better signal strength?
@iharyadi has been for a while developing a multi-sensor ZigBee router device which is also "friendly" with Xiaomi / Aqara devices. He may know some useful information regarding end device aging and the "leave and rejoin" request that parents send to aged out end devices (child devices).
The Base Device Behavior spec does specify persistent data (stuff that does not always get re-discovered when the device powers up, such as whether the device was previously joined to a network and what channel it was using). See 'Zigbee Persistent Data' sec. 6.9 in Zigbee Base Device Behavior Specification
Yes, and those poll control interval and timeout parameters would normally be directly accessible via cluster 0x0020, but Xiaomi / Aqara ignore write attribute commands to that cluster (and pretty much any other.)
Both. But I was focusing on the normal ability to change the poll control interval and timeout values. If that was accessible on Xiaomi / Aqara devices, it could potentially clear up most if not all of their issues.