TLDR version: I have a rule that turns on a virtual switch, waits for an event with a 10 minute timeout, and then should turn the switch off. Sometimes the logs show that it sees the 'wait for' event happen, but it just doesn't fire the final action. Sometimes it does work as expected.
Here is the rule... explanation of the logic below.
I have audible alerts in the house when my doorbell camera or a second camera detect a person near the front door. To try to prevent those alerts from going off when we exit the house, the alerts are suppressed for a period of time after the front door opens. That is done by turning on a virtual switch named "VS Suppress Doorbell". The rule waits 10 minutes for the front door to close. Two minutes after the door closes (or the timeout expires) it should turn the virtual switch back off.
I don't think the required expression is any part of the issue here, but just to explain what it does:
The required expression has two parts:
My Blue Iris system (via MQTT) is NOT in profile 7. (That is a 'pause alerts' profile. If that was set by something other than this rule, I don't want to fire the rule)
VS Suppress doorbell is on (indicating this rule already fired once)
hub variable "hsm_last_alert" does not contain "intrusion" (do not suppress the alerts if HSM has detected an intrusion)
The somewhat convoluted logic is because when VS Suppress Doorbell turns on, a rule sets Blue Iris to profile 7. So if Blue Iris is in profile 7 AND the virtual switch is on, it is because this rule fired. If Blue Iris is in profile 7 but the switch is off, then the profile change came from elsewhere and this rule shouldn't run.
Here are the logs of it failing. The "Event: VS Suppress Doorbell switch off" at 19:00 was me manually turning the switch off. It should have turned off at 18:31:55, two minutes after the front door closed.
Yeah, frustrating results for sure. I don't have any idea why it would behave the way you show above, or why it would work sometimes and not others. Software bugs are reproducible, not generally of a flavor where they do different things for no observable reason (speaking for apps anyway).
Actually, it should have logged the "Action: Off" immediately, but didn't.
I'm not really sure, but it works much more often than it fails. I recently added a rule that notifies me when the virtual switch stays on for over 20 minutes, so I'm always noticing it now when it does happen. I agree, this would be a lot easier if it just never worked rather than failing occasionally.
Do you think this could just be excessive hub loading causing it to fail? I took a look at the logs and noticed when it failed, that log entry was right in the middle of a flurry of other log entries from another rule that had just triggered. I only have 1 second resolution in my external logs database so I can't see the millisecond timing of the events, but the database shows 18 log entries at 18:29:55.x
You can see that after the wait-for-expression ends the repeat, it should send me a message saying that the door was automatically locked, however about 25% of the time I never get the notification. In logs it will show that the wait event happened and the repeat ends with a line of "Action: END-REP (waiting for next)". At that point, sometimes it continues and I get the message, or sometimes it stalls and and simply nothing else happens after that.
I don't currently have a log entry showing it failing as I cleared the logs recently testing that new reboot option. If I can get a screenshot of the log entries after a failure I'll post.
Always good to have company. The only thing worse than an intermittent failure is an intermittent failure that doesn't exist anywhere else.
Yes, it logs that the event it is waiting for happened, but then it doesn't seem to go on to the next action. Unfortunately this is one of those cases where logs may be of minimal value as the problem is something that didn't happen, and things that don't happen aren't in the logs.
I can't understand your current problem at all, can't see any circumstances in the code that would cause it, and I can't reproduce it. Wait for Expression does use slightly different code than Wait for Event.
**edit - I just noticed another glitch when looking at the above screenshot of the rule. Well, maybe not a glitch, but an inconsistency... In the Trigger Events section, the variable "away_status" which is a number, is correctly represented as a number in the lower range value, but is represented as a decimal in the upper range value.
Could there be anything in engineering logs that may help? If so I can PM you my hubs UID.
I'd say it fails 20-25% of the time. For awhile (during the 2.3.7 beta program) I thought it only failed on the first time the rule ran after a reboot, but I have since noticed it failing at other random times as well.
**edit - I just attempted to test it several times, and it looks like it seems to always fail the first time after it was "initialized". I did get it to complete on the most recent attempt to run it. Here are some logs:
@bravenel I have been able to repeat the rule stall condition on my C5. Any time I initialize the rule and then test run it, it stops after the repeat is ended. On subsequent runs of the rule, it works fine. But if I open the rule again and hit done to initialize it, it will always fail on the very next run.
There is actually something you could do for me to help find the issue. I need to see the entire Application State of the rule, both right after you update/initialize the rule, and then later after it has run successfully. Application State is on the App Status page (gear icon). This may entail multiple screenshots as there is a lot there. Whatever difference is causing the failure should show up there.