Another occurrence of the RM pop() issue

peterwoggle · June 16, 2020, 3:23am

I got one of these pop() errors, but haven't been able to reproduce it. Based on previous posts, it sounds related to concurrent instantiations of the same rule.

My rule usually works fine, but this occurrence prevented my lights from turning off and has some suspicious timings in the logs. I'm posting here to better understand edge cases and gaps in my mental model, particularly around assumptions I've been making about concurrency and safety.

The mode was not Theater, so only the ELSE block is relevant. Here are the logs for this rule:

Not shown is that the "Hall / Motion sensor" went inactive at 07:03:47.614 pm, which would have satisfied the wait condition in the ELSE block, beginning the 2-min delay almost exactly 2 mins before the error.

Also suspect is that the previous triggering of the rule was almost exactly 2 mins before that last motion-activated one, at 7:01:34.015 pm, which would have been completing its 2-min delay at almost the same time as the motion active trigger.

Timeline

t0: Door triggered rule; rule flew through wait condition (without waiting) to the 2-min delay.
t0+2min: That delay completes at nearly the same time as the rule is triggered again by motion.
t0+2:10: The second execution's "wait" is satisfied when the motion sensor goes inactive.
t0+4:10: The second execution's delay ends, followed immediately by the error.

This sequence casts an assumption I've been making onto shaky ground, which is my main reason for this post: Since the first action is a "cancel," all delays are cancelable, and waits are aborted when there's a new instance, I would expect that to absolutely prevent concurrent executions of the rule.

Since RM is event-based and doesn't provide thread management (and doesn't warrant the complexity of such concurrency, as far as I can tell), I would assume that both rule instances were executing in the same thread or otherwise couldn't interfere with each other within "scheduled units" (consecutive actions kicked off by a delay or a wait). Therefore, either the second trigger happened first (which would have canceled the first execution's delay), or the first execution's resume happened first (which would have completed the rule's execution before the second instance ran). Is that the intended behavior, or am I missing something?

I would very much appreciate any help to clarify my understanding! Thanks.

dadarkgtprince · June 16, 2020, 4:10am

I've also received a pop() error recently. I didn't have multiple instances of a rule running (I've actually set a private boolean to not allow it to run again if it's already running and to exit the new instance).

2020-06-15 10:49:11.560 pm error java.util.NoSuchElementException: Cannot pop() an empty List on line 6523 (delayedActs)

My rule is only 62 lines, so not sure how it can get to line 6523 unless that's a reference to the hub and not the rule

bravenel · June 16, 2020, 4:17am

Turn on Action Logging. That will show the sequence of the flow in the rule. Also, post the release version you are running on.

peterwoggle · June 16, 2020, 4:26am

Sorry, meant to include that. Version 2.2.1.113.

I can try to catch it with more logging next time, but it happens pretty rarely, and not usually in the same rule.

dadarkgtprince · June 16, 2020, 4:26am

Not sure if I can piggyback off @peterwoggle 's post, but I'm running 2.2.1.116

bravenel · June 16, 2020, 4:49am

There is nothing that can be taken for granted about the relative orders of execution of two simultaneous instances of the same rule. You're assuming that upon a new trigger event that Cancel Delays happens before the Delay itself has completed, But the Delay could complete, and the new instance run to completion before the END-IF following the Delay. Boom, there goes the POP of an empty stack.

This almost always is associated with Delays that are embedded in IF-THEN structures. These rules are trying to be too cute, too sophisticated. Get the Delays out of the IF-THENs, and they won't blow up on you. I'd scratch these rules and rethink the automation. Break it up into multiple rules -- get rid of the complexity -- it doesn't work reliably, you've proven that. Separate the on logic from the off logic. KISS.

peterwoggle · June 17, 2020, 6:02pm

Good to know. I changed my rule to avoid if-blocks altogether (delay per mode instead). I'm not sure I can cleanly split it into two rules because any motion should turn on the lights (if off) and also "reset" the timer that turns them off. I'll start another, more specific discussion thread for that.

But to drill down in this discussion, in the rule below, is it possible for the local test var to become >1? Specifically, if the rule is triggered in quick succession, could their actions be interleaved such that both instances set test to 0, then both increment it?

bravenel · June 17, 2020, 7:10pm

In theory, yes, but with very low probability. Variables are kept in atomic state, which greatly reduces the likelihood of this problem (still theoretically possible though).

peterwoggle · June 21, 2020, 3:27am

Got it, thanks. I'll keep the thread-unsafe execution in mind.