Rules stay in running state forever

TArman · November 8, 2024, 3:27pm

I have several 5.1 rules that are meant to control status displays (switch LEDs, etc) that I find are obviously out of sync with the status of the device(s).

When I look at the rule, all of the state values are current but it looks like it never ran even though the trigger should be true.

If, I change the state of a trigger (I.e. a door lock), the rule does not run.

If I “update rule” then change the trigger state, all is well until the next time I notice that it is wrong.

I have tried to correlate with things like hub updates, restarts, internet connection issues, but nothing definitive.

I would like to proactively run “update rule” on all of the rules daily until I am able to determine what’s wrong, but that’s not realistic.

Anyone else seen this “broken trigger” issue?

bertabcd1234 · November 8, 2024, 3:39pm

What evidence supports the rule having not run? (I know the outcome for sure, but it's not clear if this is why.) Enable all logging for the rule, and provide output of the logs filtered to just this rule when you think it should have run--or note if you still don't see anything with all logging enabled. Also provide a screenshot of the rule.

TArman · November 8, 2024, 3:52pm

Thanks, I will collect info and post next time I’m in that state. FYI, when it happens, it is multiple rules. The indicator ones are the obvious ones.

TArman · November 8, 2024, 4:07pm

Here’s one of the effected rules.

TArman · November 8, 2024, 4:19pm

Dang it, just found another in the “should have run” state.
Changed the trigger device several times to be sure.
BUT, didn’t have any logging turned on and when I turned it on and saved, the rule began working properly again.

bertabcd1234 · November 8, 2024, 5:49pm

What do you see in Past Logs for this rule?

Also, what does the "Events" page (accessible from the device detail page) for each of the trigger devices look like?

If you notice one that isn't working despite all looking OK: capturing the "Event Subscriptions" section on the app status (gear icon) page for the rule may also be illuminating, although you won't be able to do or change anything there.

TArman · November 9, 2024, 12:28pm

Here is a clean occurrence of the problem.
The rule appears to never “finish running”.

app:10992024-11-08 05:33:37.999 Event: ArmanC8 variable:LED_Refresh true
app:10992024-11-08 05:33:35.378 NOT Triggered - Already Running: Variable reported LED_Refresh(false)changed
app:10992024-11-08 05:33:35.259 Event: ArmanC8 variable:LED_Refresh false
app:10992024-11-08 05:33:33.127 NOT Triggered - Already Running: Variable reported LED_Refresh(true)changed
app:10992024-11-08 05:33:32.484 Event: ArmanC8 variable:LED_Refresh true
app:10992024-11-08 05:33:31.836 NOT Triggered - Already Running: Variable reported LED_Refresh(false)changed
app:10992024-11-08 05:33:31.573 Event: ArmanC8 variable:LED_Refresh false
app:10992024-11-08 12:45:02.912 NOT Triggered - Already Running: Variable reported LED_Refresh(true)changed
app:10992024-11-08 12:45:02.738 NOT Triggered - Already Running: Variable reported LED_Refresh(true)changed
app:10992024-11-08 12:45:02.139 Event: ArmanC8 variable:LED_Refresh true
app:10992024-11-08 12:45:01.999 Event: ArmanC8 variable:LED_Refresh false
app:10992024-11-08 12:40:47.665 NOT Triggered - Already Running: Variable reported LED_Refresh(true)changed
app:10992024-11-08 12:40:47.565 Event: ArmanC8 variable:LED_Refresh true
app:10992024-11-08 12:40:44.565 NOT Triggered - Already Running: Variable reported LED_Refresh(false)changed
app:10992024-11-08 12:40:44.560 Action: END-IF
app:10992024-11-08 12:40:44.557 Action: setStatusLED(6, 'Red', 'Yes') on Sunroom Light, Bedroom Light(skipped)
app:10992024-11-08 12:40:44.391 Action: IF (Bedroom Window Center contact open(F) [FALSE]) THEN (skipping)
app:10992024-11-08 12:40:44.178 Action: END-IF
app:10992024-11-08 12:40:44.151 Action: Exit Rule (skipped)
app:10992024-11-08 12:40:44.148 Action: setStatusLED(6, 'Red', 'No') on Sunroom Light, Bedroom Light(skipped)
app:10992024-11-08 12:40:43.910 Action: IF (Bedroom Window Open contact closed(F) [FALSE]) THEN (skipping)
app:10992024-11-08 12:40:43.893 Event: ArmanC8 variable:LED_Refresh false
app:10992024-11-08 12:40:43.349 Action: setStatusLED(6, 'Off', 'No') on Sunroom Light, Bedroom Light
app:10992024-11-08 12:40:43.270 Triggered: Variable reported LED_Refresh(true) changed
app:10992024-11-08 12:40:42.787 Event: ArmanC8 variable:LED_Refresh true

TArman · November 9, 2024, 12:45pm

FYI, I opened rule to look at it and it is still in “bad” (running) state. Just looking hasn’t changed anything yet.
So if any information in th details is valuable I can capture.

TArman · November 9, 2024, 12:53pm

@bertabcd1234 Is this right, expected?

TArman · November 9, 2024, 3:07pm

@bravenel @gopher.ny Are you available to look at this issue?

hydro311 · November 9, 2024, 3:45pm

It seems like your first action (setStatusLED(6, 'Off', 'No') on Sunroom Light, Bedroom Light) changes the "variable:LED_Refresh" status -- is that a correct correlation ?

If so, that seems to somehow keep the rule stuck in a running loop.

I realize "changed" triggers should work fine (and probably do in many cases), but I've stopped using them altogether... I've found they are often too mushy, especially when the corresponding trigger events typically change back-&-forth very rapidly (door contact status, lock status, etc) or can otherwise change while the rule is in motion.

I can well relate to the attraction of getting a desired result all under one single rule, but for the two rules posted in this thread so far, I'd instead split them up into seperate rules with distinct/specific triggers.

TArman · November 9, 2024, 3:49pm

Nope, that just sends LED state info to a Homeseer Dimmer with 7 LED’s

TArman · November 9, 2024, 3:54pm

I haven’t had issues with changed, but I they can be related to the “still running” issue, I can live with double the triggers (on/off,true/false) for these rules.

TArman · November 9, 2024, 3:55pm

No loops here. Notice over 5 hours between some triggers.

bravenel · November 9, 2024, 4:41pm

Yes, if you will post screenshots of the complete logs for a rule that doesn't behave as expected (copy/paste of logs is unreadable). The 'running' state is new, and it is possible that it still has some bug.

Exit Rule is one special action that sets 'running' to false. I just checked this for Exit Rule in IF-THEN or Simple Conditional, and 'running' is indeed set to false by Exit Rule. Also, a rule completing all of its actions sets it to false as well.

TArman · November 9, 2024, 6:35pm

bravenel · November 9, 2024, 6:57pm

It looks to me as though you have a race condition with it being triggered rapidly. There's only 5 msec between end of running and new event, which is not enough time for the app's state to have been updated:

Do what @hydro311 suggests: Get away from *changed* triggers followed by a test on the same thing, split the rule, etc. Get away from using the toggle for not triggering while running, as that is killing your rule.

What happened is that the second instance, the one that reported NOT Triggered, grabbed the rule state while 'running' was still true, then that instance wrote state back when it exited, overwriting the first instance trying to set it to false.

TArman · November 9, 2024, 7:22pm

Isn’t this a bug? Shouldn’t the second instance just “go away, doing nothing”?

I am not checking the value of the troublesome trigger (LED_Refresh) in the rule and I have no control over the arrival time of trigger events.

I don’t see how making two triggers, one for true and one for false, would result in anything different than a “changes” trigger.

bravenel · November 9, 2024, 7:32pm

No, it's not a bug. It's a race condition. Events in the real world happen faster than the software can process. App state is always written when an app exits, that's how the platform functions.

Dealing with race conditions can be difficult. Generally, something has to be slowed down, or filtered somehow. Splitting the triggers reduces the rate of trigger events, getting rid of the IF-THEN speeds what's left. What is setting the variable? That's happening awfully fast. If it is being set as a consequence of this rule running, then you are creating your own race condition with that variable changing as the trigger...

TArman · November 9, 2024, 8:46pm

These rules monitor the states of doors, windows, lights, HVAC, etc., and light various LED in meaningful colors to show me the house state in a nutshell.
It is these things changing state that tells the rules to check their devices and to update their LEDs.

Randomness is not controllable.

So, during this brief period of time there are two threads holding copies of the app state and obviously the active thread turns off “running” (among other things) while the “blocked, not gonna run, thread” holds a stale copy…. With which it dutifully overwrites the “real” state.

Honestly I am surprised that this isn’t a much more pervasive issue.

If you can’t fix it with a mutex or such, then would using the private Boolean method be any less likely to lockup?