[Solved] Zigbee Instability is back!


#1

Zigbee has gradually gone downhill again all week since completing the Zigbee mesh rebuild last weekend,. My mesh is right back to square 1, with some SmartPlugs going “sleepy” and not participating in the mesh. This is causing battery powered devices to lose messages and the plugs themselves to miss messages. I can “wake up” the SmartPlugs by turning them off and on a few times, or refreshing, or performing a configure() or two.

I did notice in the Zigbee logs when a SmartPlug goes offline, and the hub sends commands (on/off/refresh/configure), it looks like the plug is responding with a Bind Request.

Candles, Attic2019-01-13 16:50:27.529 profileId:0x104, clusterId:0xb04, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-23

Candles, Attic2019-01-13 16:50:26.518 profileId:0x104, clusterId:0xb04, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-24

Candles, Attic2019-01-13 16:50:26.314 profileId:0x0, clusterId:0x8021, sourceEndpoint:0, destinationEndpoint:0 , groupId:0, lastHopLqi:255, lastHopRssi:-24

Candles, Attic2019-01-13 16:50:25.607 profileId:0x104, clusterId:0x6, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-23

Candles, Attic2019-01-13 16:50:24.376 profileId:0x104, clusterId:0x6, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-24

Candles, Attic2019-01-13 16:50:22.319 profileId:0x0, clusterId:0x8021, sourceEndpoint:0, destinationEndpoint:0 , groupId:0, lastHopLqi:255, lastHopRssi:-24

Candles, Attic2019-01-13 16:50:09.125 profileId:0x104, clusterId:0x6, sourceEndpoint:1, destinationEndpoint:1 , groupId:0, lastHopLqi:255, lastHopRssi:-23

Since these were stable under SmartThings, but v2, and v3 which I migrated to in October, I have to conclude that there’s some polling or other Zigbee configuration that’s not being set properly.

@mike.maxwell @chuck.schwer, any thoughts?


#2

Other than the original recommendation to not re-add your Iris plugs I assume?

I do agree with you that it is strange, though, if all the same devices worked on ST.


#3

Wow, I never got a RSSI of -24, I get -60 average.

Another observation, HE created a polling app for zwave devices because that was the workaround of ST to keep the devices updated, I'm thinking maybe they were doing the same with zigbee? Because maybe if you create a rule refreshing the plugs keep them awake? I don't know, just thinking.


#4

Great suggestion. Just don’t use it. After all, why fix the problem that doesn’t exist on other platforms when one can simply not use the devices. Lol this is clearly a Hubitat problem and not the plugs. There’s a major issue with the Zigbee stack here that the devs haven’t found yet.

The devices are dropping off the network and responding with a bind request. That means they’re dropping off the network because the coordinator isn’t satisfying some condition they require.


#5

I have a very dense mesh. All rooms have at least 4-5 plugs.


#6

I'm Kinda Curious Why so many plugs? Whats plugged into each one? Just trying to learn what others do with the plugs..


#7

Well, I'm primarily a zwave guy, so don't have any ideas to add. Good luck, and I hope they figure it out.


#8

Thanks! Certainly appreciate the support. I’ll craft up an email to support in the morning. Too tired for that right now.

I use them for a mix of things. Holiday decorations like electric candles, many of which are used into the winter. Besides decorations, I’ve got plugs on the washer and dryer, sump pump, tankless water heater, dishwasher, all refrigerators, home server, IP cameras, shop lights, and Eero access points.

Some are just used to power cycle technology, others monitor energy, and others are just for remote control.


#9

I have 10 of the iris plugs (assuming we are referring to the zigbee/zwave+ repeater model). HE support has only recommended I disable the zwave repeater side of these plugs...they have never recommended I remove the zigbee portion from my mesh. In fact I believe @bobbyD uses these plugs in his zigbee mesh as well.

@srwhite, you seem to know your stuff, so I'm not suggesting there aren't problems. I'm just saying that I have personally never seen the HE staff say not to use the Iris plugs in a zigbee mesh.


#10

Today was NOT a good morning. It started with about 20 button pushes that did nothing (supposed to turn on lights). Logs indicated the commands were received by the button, but neither of the SmartPlugs responded. After fiddling for about 5 minutes was able to get the lights turned on.

Then it was onto the mode change to morning, which is currently done manually due to platform instability. One mode change automation switch about 1/2 of the 23-ish devices it was supposed to (all in a device group) while the second didn't do anything. Another 5 minutes later, toggling the device group off and on a couple times, everything was turned on.

The "boss" is absolutely furious as she was a few minutes late for work today because of this. I was reminded of my promise in that I only have a little over a week left to get the system stable or I promised to move back to SmartThings.

After she left, while I was pouring a cup of coffee, the I saw the candles in the window I was standing in front of rapidly turn off and on. This is a phenomenon that has started and occurred ONLY under Hubitat. I've never seen it, but almost daily I hear the sound of a SmartPlug somewhere in the house clicking. This happened before I rebut the Xigbee mesh, and continues still after doing so.

@bobbyD could really use some support here. It's been almost a week since the last Zigbee ticket (11613) had any action.


#11

Just my opinion on the stability problem - at this point I would go back to ST if everything worked well there. If support hasn't figured out the issue by now, it would be questionable to think they will have an answer quickly.

....

And as a cautionary note to others - this is why I have what some consider a 'draconian' stance on lights. The wall switch needs to ALWAYS be able to turn on the lights - regardless of anything automation or hub.

I will never do soft buttons or hidden wall relays for that reason.

If I can't turn on my lights from the wall switch 100% of the time (when there is power to the house :smile:), that is just a bad design in my opinion. Even if a system is 99.9% stable, 0.1%=8 hours of downtime per year - and it will always be when you need it most.

Of course I come from the 24/7 industrial automation environment, and have learned time and time again that the simplest solution is almost always the best one.


#12

Are you using zigbee group message in your automations?


#13

At this point, I would probably consider moving a bunch the Iris 3210-L outlets back over to SmartThings on a different Zigbee channel. This would allow you to continue to automate those outlets and collect power usage data to some degree. I am still curious if running about 10 of the Iris 3210-L outlets on Hubitat would change the behavior or not? While not a solution, it would at least provide valuable feedback to aid in troubleshooting. Perhaps there is a currently unknown limit that the Hubitat Zigbee USB stick cannot handle? I know this is frustrating, but right now an 'all or nothing' attitude does not seem to be helping to get any closer to understanding and correcting the issue.

Of course, if the Hubitat Support team can use your system in its current configuration to identify and resolve the issue, then so be it.

I hope you get this sorted out soon!


#14

I would also look at the zigbee channel HE is running on. In smartthings I think you can see it but not change it. I don't have my ST still to verify.

But maybe see what it was in ST and change that in HE. Don't know if you need to turn of ST to avoid clashing.

Also I think when you change channels you need to wait for them to switch over or force a reset. Can't remember.


#15

Update the hub. Last update broke all my buttons and lights, they just quit working for the most part. Last few hotfixes seem to have resolved it though, the last hotfix I applied last night and this morning all my switches worked as expected. It was just a simple lighting issue, I could ask google to turn stuff on and off fine, the switches were broke. Id say update and see if that fixes anything.


#16

Did that last night,. I’ve now got SimpleLighting and Button Controller automations failing after this last upgrade.

It’s a technically sound suggestion. However, I ran the numbers and don’t know if it’s practical. Currently there are 122 battery powered Zigbee devices and 56 devices acting as repeaters. That works out to just over 2 devices per repeater. If I pull out the SmartPlugs that are used daily just for automations (not energy measurement, rebooting things, etc) that would be 32 plugs (used just for decorative and task lighting). That would leave 24 devices to act as repeaters, spread over 5,000 square feet, or 5 per device. I’m concerned with that many end devices using so few repeaters, over such a large area.

For a couple automations yes, surprisingly those work more reliably than controlling a bunch of individual devices.

That is a great suggestion. In my case, HE is on channel 19. The hub recommended 20 when I reinitialized Zigbee (that’s what I used the first time around), but I mapped each room of the house using the spectrum analyzer in XCTU with an XBee, and in most rooms it favored channel 19. I chose 19.


#17

Just a thought - how many devices do you think have direct connection ability to the hub ?


#18

ZIGBEE FAILING!!!

I came home tonight to literally dozens, and I mean dozens of battery powered devices no longer reporting. Motion sensors, buttons, etc. At least 1/2 of all battery powered Zigbee devices are now dead.

I tried to pull the battery on 2 of them, and they immediately went into pairing mode. It's as if they were never paired. In fact, I tried that with a perfectly working Iris motion sensor tonight. In theory I should be able to just change the battery. As soon as I re-inserted the batter the device went into inclusion and kept blinking blue.

I don't know what else to say.. The past month has literally been a full time job trying to keep the system running and devices connected. I've rebuilt the Zigbee mesh completely once. All of the past issues popped up in less than a week. I can't even change batteries in devices without them forgetting what network they're connected to.

The simple fact is that the platform is still buggy. It may work well for some of you, but for me it's been a complete and utter dumpster fire. I've not seen so many bugs and instability in a platform since I was participating in the Iris v2 beta program.

I've spent quite a bit of money on XBee's for troubleshooting, I've purchased extra Iris and SmartThings devices for testing, only to have those devices exhibit the same issues. And now a SmartThings button on my desk, literally 14" from the hub is now offline.

What is very surprising and very disappointing is that Support is totally unresponsive on this. I guess it's okay to expect customers to expend hundreds of man hours trying to make their product work on their own. It's been about a week and a half since the last ticket update and zero traction. I've tagged staff here with no luck. The family is beyond pissed and frankly so am I.

I'll be throwing in the towel soon. It's not worth the hassle.


#19

Update: Absolutely none of the disconnected devices will reconnect to Hubitat. I've powered down the hub for 45 minutes without any luck. I've rebooted a couple times no luck.

Every single device I try to reconnect just blinks blue, eventually three blinks of green, followed by long flash of red, then back to blue. The loop will repeat indefinitely. Hubitat sat on "Initializing" for over 15 minutes when I was trying to re-connect an Iris contact sensor. It never initialized. I gave up and took the device to the basement where the ST hub is and it connected in less than 10 seconds. I pulled batteries on a total of 4 disconnected devices tonight to get them to reconnect, and all 4 are doing this. None will connect.

The system is basically an expensive paperweight now.

I think HE's Zigbee implementation has some major issues to resolve. It also likely explains why support has gone silent on my ticket.


#20

It appears that the hub has now crashed as of 20 minutes ago. The web UI stopped responding as did any of the remaining automations. I had no choice but to pull power and now the hub has been stuck here for over 20 minutes.

The DW is having a fit because we literally have 30+ SmartPlugs behing furniture that all have to be manually switched off tonight. That will involve moving furniture. Shes absolutely furious now.

Edit: A second power pull got the hub back online. It however, went unresponsive again about 25 minutes later.