Coming over from SmartThings

I've been working with support. however they've gone quiet for a couple days.

Another update....

I had another Iris contact sensor decide to go on strike, taking many seconds to respond. A battery pull didn't solve it. I wasn't able to reset and reconnect either, nor was I able connect the sensor as new after deleting from HE. The hub would not initialize the device as new, it would just keep causing the sensor to go into a reset loop. It connected right up to SmartThings no problem, but refused to pair with HE. I replaced it with a spare for now.

I've moved the Zigbee channel again up to 19 so I could power up SmartThings (I still have devices on it) which uses channel 15. All devices made the move again, except for arrival sensors. Very impressive agility.

I've performed a couple hub shutdowns to get the Zigbee network to heal. That has helped a bit, but I'm still seeing random lag times of 3-5 seconds on devices in the same room as the hub, I've also noticed device reports are coming in in multiples..

Temperature Report from a SmartThings Multipurpose Sensor

dev:2942019-01-04 12:29:18.204 pm infoDoor, 2nd. Floor Back Porch temperature is 45.41Ā°F
dev:2942019-01-04 12:29:11.899 pm infoDoor, 2nd. Floor Back Porch temperature is 45.41Ā°F
dev:2942019-01-04 12:29:04.845 pm infoDoor, 2nd. Floor Back Porch temperature is 45.41Ā°F
dev:2942019-01-04 12:29:01.920 pm infoDoor, 2nd. Floor Back Porch temperature is 45.41Ā°F

Most reports come in as 2-4 events, several seconds apart. Battery pulls and reconfiguration does not help. Oddly it's only battery powered devices that are sending multiple reports.

I've not read any of this thread but the last post. But want to chime in with an observation and a bit of advice. Apologies if this is redundant.

You need to let the ZigBee be, for a few days anyways, before you can say for sure if a change was effective or not. I had a ceiling fan that kept dropping off, it would check-in maybe twice a day, and would never respond to commands. I'd reset it, it would work for a few hours, then the ZigBee would go back to some old-bad route and it would stop working again. ZigBee is slow and stubborn and will keep reconfiguring itself until it "thinks" it has the best route for every device. If you keep turning off and resetting devices it will never settle. I read somewhere that it takes three days for the mesh to settle. You might save yourself some stress by taking a break and see what shape your home is in after a little time off. This advice helped me, maybe you can benefit from it too.

Mike

2 Likes

Thanks for the advice! I've been at this for about 3 weeks now, without a solid fix. The funny thing is that in the past with other system I've always had problems with Z-Wave and avoided it as much as possible. It's just the opposite now.

The good news is that the Xbees I ordered will arrive today. They won't fix anything, but at least I should be able to get a decent map of the network generated over the weekend.

Nice devices, I know you will send me to a very nice place but here it goes, after you finish with your Xbee if your network still slow I suggest to start over... reset the zigbee portion of the stick, configure the Xbees first, then some of the plugs/repeaters, then end devices, then more plugs/repeaters. I believe you have something corrupted in that mesh...

I wish you good luck and I hope you don't leave HE.

Thank you for volunteering to help rebuild my Zigbee network. I'll PM my address shortly. :slight_smile:

I'm really going deeper into the stack on my theories beyond just some bad routes. The battery devices are sending multiple reports, and the plugs aren't always responding. I'm really wondering if there's some long and/or short polling, wake up intervals, or something entire non-routing related going on here.

I've been around Zigbee for years.. And this is a first. I've never had devices quickly one moment, then a few minutes later time out, then start responding once woken up. It really almost feels as if devices are just too sleepy because they do stay responsive for a short period of time once woke up or power cycled.

1 Like

Surprise, surprise... SmartThings is having another platform issue right now...

There is no going back, only forward! :wink:

I sincerely hope you get the Zigbee issue figured out soon!

2 Likes

The cloud outages and hub crashes were killing me! My hub would consistently crash and/or disconnect anytime Goodbye or Goodnight routines were run. I had a nice peak of stability from around October 2017-May 2018, but it was a rapid descent from there.

Every problem has a solution.. It's just finding it that sucks. (and keeping the family happy in the process)

A little bit of good news. I got one of the Xbee's cofigured and connected to Hubitat in under 15 minutes. The network is scanning and has been for about 4 hours. 169 nodes have been identified but routes are still being identified. Hopefully by morning I'll have something to look at.

8 Likes

Any news?

I woke up with a cold Saturday morning which has slowed me down on everything Iā€™m doing. I let XCTU scan the network for a solid 12 hours. It found a total of 172 devices, but had marked a couple routers and end devices as unreachable.

About a half hour later those devices were online and reachable but a new set of devices was being marked as unreachable.. I observed this pattern of rolling outages for a couple hours, different devices, including some with direct routes to the hub. Another concern I saw was that some links were marked as unknown and never resolved.

Those observations led me to conclude there was some serious routing table issues that was causing never ending route rebuilding. I decided to wipe the Zigbee table on the stick and start over, with just repeaters first.

Every device was factory reset and paired in place. I didnā€™t actually delete anything in the database enabling all devices to reconnect with their original names and rule assignments. XCTU was used to provide feedback as to when the mesh appeared stable before adding the next device. Once routes were recognized by XCTU I moved on to the next device.

Devices were added starting closest to the hub, moving outward in a ring. I finished the top 2 floors then started reconnecting battery powered devices. Iā€™m about halfway through reconnecting Zigbee devicesand expect to finish today.

So far the mesh is stable and responsive. I have been performing basic test every 10-15 devices. More to come....

5 Likes

Sorry to hear you got a cold, I been like that for the last 2 months, FL climate, one day is hot, the other is cold.

I'm glad to hear you are getting better results, keep going, I hope you finish soon and enjoy HE.

1 Like

You are getting there man. Good luck and keep os posted.

1 Like

Glad to hear you're making progress. Lets hope the "bug" you've caught was the one making all the trouble for you. :slight_smile:

1 Like

Is there any reason this routing information couldn't come from the hubitat stick?

At this point I have reconnected all but 4 devices, one is in my rental, the other 3 in a detached garage. Overall the Zigbee mesh is stable and responsive. Iā€™m not sure whatā€™s different this time, other than the speed at which I re-added devices to the mesh.

One odd observation, I have a button automation that switches 2 SmartPlugs. Before the rebuild they switched most of the time, but not every time. Now, only one of the 2 devices switches 100% of the time, but the other is hit or miss. Controlled individually, theyā€™re both instantly responsive. Right now thatā€™s the least of my worries.

When Zigbee is implemeted on a central hub, the hub doesnā€™t need to store a full routing table to every device. Similar to IP, it just needs to know itā€™s gateway ā€œhopā€, or best/closest neighbor capable of forwarding the message. This is an entirely different approach than Z-Wave where the hub stores the routing table as a SrcXDest bitmap of which devices can communicate with each other.

1 Like

Another observation...

I am still seeing a huge amount of duplicate messages. I donā€™t believe itā€™s a mesh issue now, since the initial message is received as are the duplicates.

[dev:226](http://192.168.7.249/logs#dev226)2019-01-07 01:11:03.908 pm [info](http://192.168.7.249/device/edit/226)Window, East 1 in Front Porch battery is 66%

[dev:226](http://192.168.7.249/logs#dev226)2019-01-07 01:11:03.772 pm [info](http://192.168.7.249/device/edit/226)Window, East 1 in Front Porch temperature is 31.16Ā°F

[dev:226](http://192.168.7.249/logs#dev226)2019-01-07 01:11:02.856 pm [info](http://192.168.7.249/device/edit/226)Window, East 1 in Front Porch was closed

[dev:226](http://192.168.7.249/logs#dev226)2019-01-07 01:11:02.814 pm [info](http://192.168.7.249/device/edit/226)Window, East 1 in Front Porch was closed

[dev:226](http://192.168.7.249/logs#dev226)2019-01-07 01:11:01.816 pm [info](http://192.168.7.249/device/edit/226)Window, East 1 in Front Porch was closed

[dev:226](http://192.168.7.249/logs#dev226)2019-01-07 01:11:01.815 pm [info](http://192.168.7.249/device/edit/226)Window, East 1 in Front Porch was closed

I still need to get to the bottom of that. Sometimes the messages are delayed 4-5 seconds which raises havoc over automations.

These are generally late as they have been stored on a parent then sent all at once to the hub.

1 Like

That's certainly possible. I didn't notice the issue with SmartThings but they may also have some event hysterisis filtering preventing the these events from getting through.

Any thoughts on how to address this? I'm not so concerned with secondary events like battery reports, but certainly motion and contact events are critical.

I hate the thought of having to create a proxy app and virtual devices like I did for the Samsung Buttons. That's working great, but I have 2 devices for every physical device. Not really keen on adding another 160 devices to solve a network issue.

I had a particularly nasty failure in a closet door lighting automation today. I've had a rule triggered by an Iris contact sensor turn on a Z-Wave light switch since I moved to the house and used Iris in 2015. When I left SmartThing this rule ran with around 99% reliability and completely local. I'm seeing about 90-95% reliability now. I did replace the contact sensor as the old one refused to pair with Hubitat.

Here's what was logged today... Despite getting duplicate events the automation never ran.

dev:792019-01-08 08:15:03.568 am infoDoor, Hall Closet was closed

dev:[792019-01-08](tel:792019-01-08) 08:14:58.794 am infoDoor, Hall Closet was closed

dev:[792019-01-08](tel:792019-01-08) 08:14:54.017 am infoDoor, Hall Closet was closed

dev:[792019-01-08](tel:792019-01-08) 08:09:57.855 am infoDoor, Hall Closet was opened

dev:[792019-01-08](tel:792019-01-08) 08:09:57.853 am infoDoor, Hall Closet was opened

It really feels as if something is being too sleepy causing the sensors to transmit multiple events because they're not being ack'd fast enough.

What repeaters you have? Iris?