[Solved] Zigbee Instability is back!

In my case none. I had not yet migrated the smartplugs.
On srwhite case I dont know.

There you would be VERY wrong. I've written many - from scratch and directly from hardware vendor specs. In many different programming languages. But none of that really matters.

I hope support is working on things (I don't really see much info in the forums on this, but I'm not sure I would expect to).

They are looking into my case and srwhite at least.

2 Likes

You're being literal.. You know what I meant.. I was refering to writing drivers for Hubitat or SmartThings. If you had, you would truly understand the point about custom code.

I understand your point. There is a difference between understanding your point, and completely agreeing with your argument. Good luck.

Again, I hope it is a HE bug. Fixing that kind of bug helps everyone.

My drivers call the Hubitat wrappers for ZCL.. If those drivers (written by me, successfully used in SmartThings) are causing issues or Zigbee instability, then it's not the custom code, but rather their implementation of ZCL that is the issue. Therefore the bug is not with the driver but with the underlying architecture of the system.

Whether you agree or not doesn't matter to me. Those are the facts.

1 Like

Technically, parts of your statement are facts, and parts are still conjecture. But this is getting very pedantic, and not especially constructive. So I'll quit polluting your thread.

2 Likes

ZCL is a common standard. If code that uses standard calls on works on one platform (SmartThings), but not another (Hubitat), that's a platform issue, not a driver issue.

@srwhite I'm not so adamant, it can also be the case that the adoption of ZCL in ST has been customized to some extent and it is through that adaptation that your code does not fail there and can fail here. Unlikely but a possibility.

The challenge we have is that we do not know/understand how zigbee or zwave was implemented by HE. Only them.

What HE also needs to understand is that if in some way they created a customized implementation of Zigbee or Zwave than the likelihood that their code is the root cause of issues is very similar to broken code from community apps or drivers.

If they are using no customized version of the implementation in zigbee or zwave than proper documentation needs to created to define code standards to be used in the platform. Only allowing custom code that validates against those standards.

3 Likes

Completely agree. At this point, most of the few devices I was able to successfully reconnect last night are offline again. Most SmartPlugs are working, but some have had to be refreshed a few times to get them to be responsive.

I've pretty much exhausted all methods of troubleshooting using the tools at my disposal. I've got some newer SmartThings v4 plugs coming which I'll do some testing with over the weekend. However, if that does not offer any clues, I am pretty much out of diagnostic ideas without the aid of support.

There appears to be some confusion on this thread about ZCL. It stands for Zigbee Cluster Library and it is a definition of how the messages are structured when sending zigbee messages to a device. This is a standard adopted by all zigbee devices and there are subsets devoted to certain device types. In our case it is zigbee home automation devices (ZHA) we provide methods that allow drivers to build these messages to send to devices, in addition you can build your own messages via the raw message type. Sending bad messages to a device does not cause the mesh to go down. Everything I'm seeing in this thread sounds like an issue with the mesh, however we are looking into the issue and will do our best to fix any problem that may be caused by our zigbee implementaion. The vast majority of our customers are having no issues with their zigbee devices so we don't have a large set of issues to look at to determine what the problem is.

Also I feel the need to reiterate our standard debug process for those of you not familiar with the typical way a system is debugged. You typically start by eliminating possible causes one at a time until the problem goes away. Most of the time it is the last thing you removed before the system stabilized that was the cause of the issue. The reason we ask you to remove custom code is that because we have many more people using built in code and do not have issues with it and we run it ourselves. So the first step is to start by disabling your custom code one at a time and see if it fixes the problem, if it does not then we start looking into the system code. It is that simple.

12 Likes

Been down with a stomach bug all day but this just arrived... In case my commitment to figuring this issue out is in doubt, these arrived a few minutes ago.

Since all but 4 of my routing capable devices (excluding the detached garage Thingshield) are Iris, or a couple older SmartThings plugs, I'll be strategically swapping out some of them to see if I can affect mesh stability.

As an experiment, I de-mothballed the Iris hub and moved 29 of the 35 plugs used on the 2nd floor and paied them with Iris. I also shut down Hubitat for a while. My Iris mesh is stable and responsive, just as it was under SmartThings.

1 Like

Well I just got told in a polite way...
We can't help solve your zigbee issues as you only have Xiaomi devices connected. So good luck!

Really considering ditching HE and spread to others not to embrace it.

Please anyone help me see things differently...

Hi Somel,

Yes, you only have Xiaomi devices, which are not on our supported list, in addition they do not follow the zigbee specification properly. Most customers who run these devices are aware of these issues and take special pains to make them work. There are multiple threads on the forum here about what devices work with them and what devices do not.

2 Likes

I suppose we will see a sayonara pretty soon....

Ouch.. In defense of the HE team, those are known for being troublesome. I've read that folks have had good luck using XBees to keep them connected as they don't communicate through routers well.

It won't go back to Iris if I do leave, that platform is on life support. I just wanted to test the devices to see how they would work in Iris. That mesh is running just as fast as Hubitat in all honesty.

The thing I find most puzzling is you are seeing devices apparently becoming excluded from the mesh; you mentioned the Iris sensors showing a blue LED after doing a battery pull-- it seems as if they are powering up in a state in which they are not joined to a network. I am trying to imagine a scenario (aside from a defective end device, yet multiple devices are behaving this way) that would account for a device disassociating itself from the network and going into join mode (without a manual device reset) unless there is some network steering issue. Mesh issues might cause an end device to rescan and attempt rejoin; but it would be trying to rejoin the network it was already part of. Admittedly I have no expertise in this but from what I've read of the Base Device Behavior specification, under the Zigbee Home Automation profile this shouldn't happen. Yet there are architected commands (in Zigbee Pro, at least) that could cause a device to leave the network.

One way to determine if the device had somehow been erroneously issued a 'network leave' command (which clears all its persistent setup data) might be to take one of the problematic Iris sensors that is showing a blue LED after just a battery pull and (without having reset it) put your ST hub into 'add Thing' mode. If the sensor was still trying to rescan/rejoin the Hubitat network, it should't succeed in joining a new ST network. But if its PAN ID had been wiped out, and it does successfully join a new network, then it has indeed purged its persistent data as it would after a manual reset sequence or 'network leave' command.

Has anyone ever heard a case where Xiaomi devices would render the Zigbee radio to a state on not unitialized?

The fact of something is hard should never be a deterrent to succeed.
Besides the Xiaomi Ecosystem is probably one of largest in the world with over 85 Million IoT devices. Why support just that amount.

Edit: better to stop lurking for today. Really not on the right mood now.

Oh, sorry to hear that man.
I can certainly see HE’s point of view though, based on the things Chuck raises .
It’s a bit of a “square peg in a round hole” kind of thing. If you make the square peg small enough it’ll go into that round hole but that is not really the point is it.

All the best, what ever direction you go.