Motion sensors, retriggers and race conditions

seb.t.richards · August 17, 2020, 10:16pm

Hi — I'm brand new to the platform, and trying to get to grips with how RM works. I feel like I'm nearly there on this, but I'm coming up against either setting restrictions or race conditions.

My goal is to hook up a couple of Hue motion sensors to my sprinkler (I'm fighting a battle with the local cats/foxes, who use my garden as a toilet). I'm mostly there, but I'm struggling to make my rule work reliably.

If I had complete control, I'd simply have the sprinkler run continuously whilst motion is detected. As it stands, the Hue sensors have a minimum retrigger period of 10s which is much longer than I'd like (2s would be perfect).

The best compromise I've reached so far, is to trigger the rule from the motion sensors individually, and run the sprinkler in two short bursts. To try and avoid concurrent execution of the rule (given multiple triggers) I added a private boolean (and subsequently a local variable, to see if this helped), but have found this results in a race condition:

It doesn't take long to get a series of error logs java.util.NoSuchElementException: Cannot pop() an empty List on line 6627 (delayedActs) and java.util.NoSuchElementException: Cannot pop() an empty List on line 6627 (doRepeatR). In fact, if I hit the "Run Actions" button a few times very quickly, the local variable will get stuck on TRUE,

I've also tried using a Zone Motion Controller to debounce the trigger, but this adds a minimum of 5s for the activity timeout, which isn't great for me. Furthermore, the race condition remains:

I'm hoping there a way to achieve all this using out-of-the-box drivers (and apps) — given they're not open source, I'm not sure how much work it would be to write custom code from scratch!

I did e-mail support and ask about reducing the retrigger internal in the Hue motion sensor driver (I think the current values are quite arbritrary), though I'm not sure that will go anywhere.

I might also be massively overcomplicating this rule!

Can anyone help?

Angus_M · August 18, 2020, 7:53am

In case you can't get this fixed and want to try a different motion sensor, there is a soldering hack in the forum on Xiaomi motion sensors which causes them to retrigger at 5 seconds. That's the fastest retriggering motion device I'm aware of.

seb.t.richards · August 18, 2020, 10:28am

Thanks @Angus_M — that's really helpful. It doesn't look like those sensors are outdoor rated, but I'm sure I could be creative with a Tupperware!

Angus_M · August 18, 2020, 11:07am

I have one outside (but under cover of rain) in a hot and humid country (Thailand) and it works very well. Not so sure about its cold weather performance and definitely wouldn't let it get wet without some work with a silicon gun first

geroose · August 20, 2020, 5:50am

There is a way to fix this. First, the problem is being caused by running the same rule again when it is still running. After one sensor triggers the rule, the other sensor may trigger it again. Rule machine will start a second instance of a rule if it gets another trigger while the rule is running, but that only works on very simple rules. The second instance is not a completely separate thing, and it can interfere with the first one. Bruce Ravenel, Rule Machine's author, has made several statements about this such as,
"Each rule only has a single state (memory) in which to keep important information. Nested IF-THEN's, for one, rely on state. Mixing nested IF-THENs with delays, repeats, etc, is where things can get messed up when there are multiple instances running."

So your rule definitely is going to have a problem with a second trigger starting it again. I've also read posts mentioning that a second trigger can stop a loop and cancel certain types of delays.

Using that variable to control it isn't going to work, since you are still re-entering the rule, which will cause the error.

You just don't want a second trigger while that rule is still running, period. There is an easy way to avoid it - using a second simple rule and a virtual switch, or global variable. For this explanation, I will use a virtual switch.

Make a second rule with both of your sensors as triggers, so it will run that rule if either one goes active. You don't need changed, since your main rule checks for them to go inactive in your loop. The action of the second rule will be to turn on the virtual switch. That all it does.

Then change the trigger for your main rule to be that virtual switch turning on. Once the second rule turns on the virtual switch, it can't be turned on again by the other sensor, since it is already on - the virtual switch can only turn on once. Once it is on it can’t be turned on again until it is off. So there should only be one trigger going to your main rule, until the virtual switch gets turned off.

You can remove the outer if-else-endif in your main rule, since you won't need to check any variable. (remove the first 2 lines and the last 3). And instead of "Set running to false", you would turn off the virtual switch. That will then allow it to be turned back on again the next time a sensor is activated. So there is never a second trigger on the main rule, it can't retrigger until it turns off that virtual switch, which is the last thing it does when it is finished looping.

Give this a try, I think it should work.

Ken_Fraleigh · August 20, 2020, 6:22am

This might do the job more simply. Of course you will have to substitute sprinkler for back yard lights.

seb.t.richards · September 14, 2020, 11:14pm

Thanks all for your comments, very helpful and informative.

I've finally had a chance to properly investingate this again. I've made some progress, but it seems the crux is coming up with an approach where a rule isn't executed twice.

I've set up two test rules:

Rule #1 — debounce the multiple motion sensors, trigger a virtual switch
This combines all motion sensors in to a single virtual swtich. (I've used virtual motion sensors, for testing purposes).
Screenshot 2020-09-14 at 23.53.19

Rule #2 — run sprinkler bursts, based on virtual switch
Binds to the "off --> on" transition of the virtual switch. Note that the logs would be where the power switch is triggered.
Screenshot 2020-09-14 at 23.51.56

Whilst this is much better than before, there's still the issue of the virtual switch transitioning from "off --> on" again whilst the above delay is in effect. This results in the rule being run multiple times simultaneously — although there are no exceptions, this does result in race conditions between all running instances of the rules, whereby the overall behaviour becomes unpredictable.

I've tried my previous trick of using a private boolean or variable to simply skip the actions if the rule is already running, but this gives me the same exceptions are before.

I've boiled down my use case to the following:

I have a switch
I have a power outlet
When the switch is turned on, I want the power outlet to strobe (e.g. 2 seconds on, 5 seconds off)
When the switch is turned off, the power outlet should stay off.
If the switch is toggled when the power outlet is turned on, it can either turn off immediately, or finish its current "strobe"
The switch may be toggled at rapid and random intervals

One solution would be to move the "strobing" outside of Hubitat — not sure if such a device exists — but it seems crazy that this wouldn't be possible!

geroose · September 15, 2020, 5:53am

You can make lights flash, but I'm not sure there is a built-in way to continuously toggle a power outlet (switch) on off without your rule doing it. Maybe put that in yet another rule. Any rule can pause any other rule, so the one toggling the switch can be started and stopped by another rule or maybe paused and unpause would work better. As I mentioned before, rule with nested If's and delays can get very touchy about being triggered while they are still running. I will need to mess around with some tests and get back to you later about this.

seb.t.richards · September 16, 2020, 10:35pm

Thanks for the suggestion @geroose — I've just had a go with inter-rule triggering... a bit closer, but still facing race conditions!

Rule #1 — debounce motion sensors

As before, debounces motion sensors to a single virtual switch
In addition, explicitly triggers rule #2 (first requests existing execution is cancelled)

Screenshot 2020-09-16 at 22.54.05

Rule #2 — run sprinkler in bursts

Rather than being triggered by the virtual switch, it's triggered exclusively by rule #1

Screenshot 2020-09-16 at 23.07.34

An improvement, but if I aggressively toggle the virtual motion sensor, the behaviour becomes erratic — in particular, multiple instances of rule #2 still get triggered. I've tried adding a 1 second delay after "Cancel Timed Actions", but it doesn't appear to have an impact.

My hunch is that the "Cancel Timed Actions" and "Run Actions" aren't deterministic when used this way.

I'm not quite clear on the best place to get documentation for custom app code — but perhaps writing a custom app would be the next place I look. I did spot in the ST documentation that using synchronized is discouraged (and possibly not honoured?). Any tips on that?

Thanks again!

bravenel · September 16, 2020, 11:49pm

Anything with simultaneity risks being indeterminate.

One thought, if you want to debounce something, look at the app in our public repo here: HubitatPublic/example-apps/debounceContact.groovy at master · hubitat/HubitatPublic · GitHub

That is a determinate way to do a debounce of a sensor.

There are a number of problems with this. One is to avoid having multiple instances of the strobe activation rule running. Another is how quickly you can retrigger the strobe activation rule. Consider this pair of rules:

Note that the first and last actions are the mechanism to prevent multiple simultaneous instances.

The problem with re-triggering this is that there must be 6 seconds after one loop starts, before triggering it again will work. There is this short dead zone. Turning the switch on during that period would not start the strobe because it will not have reset itself yet.

This doesn't seem like a complete solution to your statement of the problem to be solved. It's going to be tricky to figure out.

bravenel · September 17, 2020, 12:04am

Ah, progress. This pair can be retriggered without the 6 second dead zone:

By putting the Wait in the Repeat loop, it's going to catch the change in the controlling variable, Strobe On, as soon as it happens, and reset the rule for the next go. The Wait itself is cancelled and renewed each time around the loop. The Wait doesn't stop the loop from running every 6 seconds.

This one seems pretty robust, banging away on the switch....

bravenel · September 17, 2020, 1:51am

You mentioned your interest in doing this in Groovy. It's actually much simpler in Groovy than it is in RM. This is a very simple app, and doesn't have any of the worries about simultaneous instances and what not. It has a controlling switch, on to start the strobe, and off to stop it.

definition(
	name: "Strobe",
	namespace: "hubitat",
	author: "Bruce Ravenel",
	description: "Basic App",
	category: "Convenience",
	iconUrl: "",
	iconX2Url: ""
)

preferences {
	page(name: "mainPage")
}

def mainPage() {
	dynamicPage(name: "mainPage", title: "Strobe", uninstall: true, install: true) {
		section {
			input "trigger", "capability.switch", title: "Select controlling switch", submitOnChange: true
			input "lights", "capability.switch", title: "Select controlled switches", multiple: true, submitOnChange: true
			input "onSecs", "number", title: "Off after this many seconds", submitOnChange: true, width: 6
			if(onSecs)input "interval", "number", title: "Select cycle time", submitOnChange: true, width: 6, range: "${onSecs + 1}..*"
		}
	}
}

def updated() {
    unsubscribe()
	unschedule()
    initialize()
}

def installed() {
    initialize()
}

def initialize() {
	subscribe(trigger, "switch.on", onHandler)
	subscribe(trigger, "switch.off", offHandler)
}

def evtHandler(evt) {
	if(evt.device.currentMotion != "inactive") log.debug "Error with $device"
}

def onHandler(evt) {
	loop()
}

def offHandler(evt) {
	unschedule()
	lights.off()
}

def loop() {
	lights.on()
	runIn(onSecs, lightsOff)
	runIn(interval, loop)
}

def lightsOff() {
	lights.off()
}

geroose · September 17, 2020, 6:21am

So now I’m confused about this. I’ve been messing around with repeats to find out why they work and sometimes don’t. That rule doesn’t prevent more than one repeat going at once, at least on my system.

At least from the rules I have tried , if I have a 10 minute repeat and I stop it with “stop repeating actions” then there is still a scheduled job showing on the settings page. When that scheduled time comes up, then if I refresh the page, there are no scheduled jobs.

But if I try to run the rule again, and restart the repeat before that scheduled job goes away, then it starts repeating more frequently and there are 2 scheduled jobs showing. It’s like the stop for the first one got cancelled and they both run. ? It looks like you have to wait 10 minutes after stopping a 10 minute repeat before you can start another one. I just entered in the rules you show and set it for a 30 second repeat, having it turn a light on and back off in 3 seconds. I’m using a virtual switch for the trigger for the rules. If I turn it off and then back on before 30 seconds, then I also get two of them running, even with your rule - and I see two scheduled job running. It sounds like it works on your system but not on mine ????
That other rule you showed, with the while loop, does work correctly for me, at least it turns the "running" variable to false at the same time the scheduled job goes away. Your rules always look so nice and simple, you do practice what you preach. I am wondering if instead of exiting if running is false, maybe you could put in a 4 second delay to wait for it to be safe to run another loop. I haven't tried that yet, but maybe it would work?

bravenel · September 17, 2020, 3:00pm

I don't think you need it. Thinking about this, I'm now not understanding where you are getting multiple instances (I admit I haven't studied your rule).

I think it can be simplified even further, with only a single rule:

This benefits from the fact that the trigger switch cannot turn on again until it turns off. When it turns off, the repetition is going to stop. There is perhaps some crazy super slim chance of something happening in the 50 msecs after the switch turns off, while it is stopping the repeat and canceling the delayed off. But I discount that because you'd be hard pressed for that switch to generate a new on less than 50 milliseconds after turning off. So this gets rid of the protection not to run twice. I can bang away at that switch, and it stays on the rails for me.

geroose · September 18, 2020, 5:24am

Well, that rule will can get more than one repeat going too. First, here is what I have entered, just so you know its the same as yours…

Notice I am using a 30 second repeat time. I suspect your 4 second time is making it hard to see anything unusual.

I suspect almost any rule with a ’stop repeating actions’ can have the problem.

Change your repeat time to 30 seconds. Start the rule, let it run to verify it’s flashing your lights, then stop it shortly after the lights flash.
View the scheduled actions and you will see there is still one there, as if it was going to keep running. But when that scheduled time arrives, the lights don’t flash, and refreshing the page shows there are no scheduled actions.

Start the rule again. Shortly after the lights flash, stop it. View that scheduled job’s time.
Before that time arrives, restart the rule. You don’t have to switch it off/on real fast, just be sure to turn it on before that scheduled time.
Watch your lights, they should be flashing more than once in 30 seconds. And there will be 2 scheduled jobs. It's running two repeat loops at once.
Like this…

I have seen this with the other rule you made also. It isn’t just happening on my system, I was helping someone on a week or so ago who was using a 40 minute repeat loop and wanted to be able to stop it before 40 minutes and reset it for 40 minutes from now. He showed a screen shot showing two scheduled actions.

Obviously if you can reproduce this you will be able to figure out exactly what is going on, but here is my guess…
Stop repeat action doesn’t remove the scheduled job, instead it creates a setting that will stop it when it tries to run. I found a setting on that page called stopRepeatMain, and it was set to true after stopping the rule. So I think that might be what is stopping the scheduled job. As soon as I started the rule again, that setting switched to false. As you have explained, there is only one state per rule, so this must be part of that.
With that off, the existing scheduled job doesn’t get stopped and when the restarted rule starts its repeat, that creates a second repeat that is also running.

I might be way off base on all of that, but I know for sure that it’s not that hard to get 2 repeats going at once if you start the job up again before the existing scheduled job goes away. You need to try it on a slower repeat, I’m not sure you can see it on a 4 second repeat.

bravenel · September 18, 2020, 4:05pm

That's interesting, I will investigate that. Try this instead:

bravenel · September 18, 2020, 11:49pm

Digging into this further, it seems as though stopping repeats is broken completely. I traced back through changes to the code back about 18 months, and found a change where a key unschedule() was removed. But, I don't know why it was removed, and do know that this is constant with what you are reporting. So, I have some work to do to test with this one line of code restored. I think even what I showed above will not work.

I will report back after doing some testing.

geroose · September 19, 2020, 4:52am

I’m on the latest hub software, and have “cancel rule timers” available, I assume it is the same as “cancel timed actions”. But yes, you’re right, it isn’t any better than stop repeat actions.

I mentioned I was helping someone using a 40 minute repeat. I did find a workaround for that one. Have to start the rule with stop repeat actions to turn the stop setting back on. Then I had it wait 40 minutes before starting the repeat loop, because he was using a 40 minute repeat. The OP on that one wanted a delay before anything happened, so that worked out ok for him and allowed him to stop and restart the rule any time for a new 40 minute delay. It worked but it seemed like some sort of hack.

But now that you are working on a real solution, I’ll wait to see what you come up with.

I’m not the OP on this thread, but was trying to find a way to help him with his rule. And I have plans for a repeating rule for myself that I would like to get working. Appreciate your help on this.

bravenel · September 19, 2020, 9:28pm

I figured out what's going on here. There is a problem, but not the one I thought or that you speculated about wrt state. It's a different problem, brought on by an attempt in RM to do too much. The problem arises because of an attempt that was made to support Repeat for 'button device', in which in theory multiple simultaneous repeats would be possible if started by different buttons. The attempt to solve that use case broke the basic repeat functionality of other rules (and probably doesn't work right for that use case either), resulting in mis-timed repeats when the trigger happens during the repeat cycle, as you discovered.

Some work needs to be done to resolve this, and hopefully a fix will be available in the next release.

seb.t.richards · September 22, 2020, 10:44pm

Thanks both for your input on this.

@bravenel using a control boolean was the approach I started off with, but I found it still caused a corruption with the already running rule instance (see OP). It seems this is the bug you've identified.

Using that code example you provided, I've come up with a strobe app that works perfectly for my use-case: hubitat/strobe-fsm.groovy at master · sebrichards/hubitat · GitHub. In particular, it allows the existing "strobe cycle" to complete, rather than terminating it abruptly — I find this works well when a subject moves between motion sensors in the same area.

On a side-note, I used control variables in the first version I was testing out (hubitat/strobe.groovy at master · sebrichards/hubitat · GitHub), but found that booleans weren't persisting correctly in state. I switched to using a {0, 1} integer and it worked — not sure if that's me doing it wrong or a bug?? In the end I went with an FSM-style approach, which I think is a bit easier to digest and reason about in this context.