Simple automation rules doesn't work properly when controlling large number of devices

When using simple automation rules, if the number of the switches is 4 or less, the rule work properly.

When the number of switches is more than 4, there are always some switches not being tuned on when the rule is executed. The more switches I add to the rule, the more switches will miss. Interesting enough, the switches not being turned on properly are always z-wave switches. Zigbee switches always work.

How can I further debug this issue? When using dashboard, I can turn on/off all my light switches without problem. Here is my rule that controls 7 light switches. When the button is pushed, only 5 lights are on in this case.

I am running 2.2.5.131 on C-7 hub.

Well, I must admit to being somewhat stumped.
I don't really know what is wrong with that rule.
However, there may be a way of getting around it -
Why don't you put those switches into a group, and just turn that Group (name) on?
That would get around the issue, and may make the whole thing run faster.

P.S. I have many Simple Automation that turn on/off MANY lights all at once without any issues.
e.g.:

Thanks for the suggestion. Let me give it a try.

So what happens if you make a (nearly) identical rule, but put half of the devices in each rule? Use the same trigger and so on.

At least it will tell you if it is a Zwave issue, or a rule problem. I would lean toward possible Zwave mesh issues because there isn't much to go wrong with simple automations.

This is a general problem that we are currently investigating. It appears to possibly be the case that with many devices at once the radios can be over-run. One thing we've been looking at is introducing metering in the apps to prevent this. In one test case, we found that introducing 60ms delay fixed the issue, without introducing noticeable 'popcorn effect'. Stay tuned...

11 Likes

Thanks for confirming this issue. Is the radio over-run issue only for z-wave? If so, it explains why zigbee devices never miss.

No, it happens with both Z-Wave and Zigbee, although the two have very different issues in this regard, and may perform differently. We are still investigating.

5 Likes

Interesting. I am using RM to turn on/off several (qty. 3) lights at once. In the last couple of weeks, I am having problems that I did not have before. At least one of the lights will not turn on/off as directed by RM. Also, I am suddenly having problems with the devices not properly updating their status. The light will turn on/off as directed by RM, but the status is not updated.

I am running rules for about 20 Z-wave devices at one time (good night rule closing sutters and lights). Some of the devices would miss.
Found out that if I group them into 2 or 3 groups and applying a delay action of some seconds between every two groups helps.

W00T!!! I'm stoked--I'm def all ears on this one. :slight_smile:

Ah - this makes sense now. Just added a fifth zigbee device to a switch off rule and wondered it wasn't working. As they are all in the same room, I think I will take the group route. Interestingly, I have not had an issue on the reverse rule that switches everything on.

My HE group of 8 sengled recessed lights + 2 sengled lamp bulbs appears to be working okay with Zigbee messaging turned on. This is on a C-5.

Zigbee group messaging in Group is the best solution for this, at least with respect to Zigbee devices. A single command turns on all of those lights. Use is wherever you can. I wish Z-Wave had a comparable capability. Lutron has a method for this, and it works very well also (phantom button actuated scenes).

2 Likes

@bravenel Do you think this might be addressed in a 2.2.5.xxx patch or is it likely to be 2.2.6 or later? I'm def interested in testing things to see if it helps the issues I've been seeing.

Thanks!

2.2.5 is done. We are still investigating, but there will be optional metering for both Group and Scene in 2.2.6. And, maybe for SAR -- depending on what we discover in our testing. The metering is easy to do, but may not be the ultimate solution.

3 Likes

Seems like if there needs to be command queueing it should be on the device subsystem side, not individual apps?

Wouldn't it be better if app developers didn't have to worry about it? Consider a situation where groups and SAR fire at the same time?

All things I'm sure y'all have already thought of and are looking at. Just thinking out loud I guess.

3 Likes

I have a RM where if my Alarm is activated, I turn on almost every light inside my house, and outside my house (so bad guys can't hide in the dark). I was using a simple ON command with a list of 25+ lights. I just switched it to a single Group. Regardless of the radio issues, It is probably not good for my houses electrical system, and therefore the devices, to turn on every switch at the same time.

  1. I have most of my stuff on dimmers now.
  2. They don't seem to react "instantly" in any case.

So, there doesn't seem to be a real-world issue with inrush currents all hitting at the same time.

I'm also very surprised that this could be happening.
Routinely, I turn on 5-7 devices at a time, and I haven't noticed any issues.
(Both from Simple Automation and Rule Machine).

If this is really an issue, it must be happening under certain unique circumstances.

I have several automations that run and use groups to turn pretty much everything on or off.

Things that run when I wake up, when I go to sleep, when I leave home, when I get back, etc. They adjust pretty much every light in my house.

Those have never been perfect. They got better in the 2.2.3-2.2.4 timeframe--then took a marked turn for the worse in 2.2.5.

However, even some much more simple things--like adjusting 5-6 of my outdoor lights at sunset/sunrise. Those started failing to be reliable in 2.2.5 on my C7.

Or, adjusting the LEDs on my 8 Inovelli switches--those rarely seem to work the first time through on 2.2.5 now (they were working fine most all the time on 2.2.4).

So, something seems to be bottling up the traffic. Now that I have a ZWave Toolbox (kinda limited in function--but I can see some stuff), I am seeing what appear to be routing failures and drops now which seems to tell me that something is overloaded and unable to respond as expected when there's a lot of simultaneous activity.