Rule Machine - rule stops after wait for event with timeout

C7 - version 2.3.7.145

TLDR version: I have a rule that turns on a virtual switch, waits for an event with a 10 minute timeout, and then should turn the switch off. Sometimes the logs show that it sees the 'wait for' event happen, but it just doesn't fire the final action. Sometimes it does work as expected.

Here is the rule... explanation of the logic below.

I have audible alerts in the house when my doorbell camera or a second camera detect a person near the front door. To try to prevent those alerts from going off when we exit the house, the alerts are suppressed for a period of time after the front door opens. That is done by turning on a virtual switch named "VS Suppress Doorbell". The rule waits 10 minutes for the front door to close. Two minutes after the door closes (or the timeout expires) it should turn the virtual switch back off.

I don't think the required expression is any part of the issue here, but just to explain what it does:
The required expression has two parts:

  1. My Blue Iris system (via MQTT) is NOT in profile 7. (That is a 'pause alerts' profile. If that was set by something other than this rule, I don't want to fire the rule)
    OR
    VS Suppress doorbell is on (indicating this rule already fired once)
    AND
  2. hub variable "hsm_last_alert" does not contain "intrusion" (do not suppress the alerts if HSM has detected an intrusion)

The somewhat convoluted logic is because when VS Suppress Doorbell turns on, a rule sets Blue Iris to profile 7. So if Blue Iris is in profile 7 AND the virtual switch is on, it is because this rule fired. If Blue Iris is in profile 7 but the switch is off, then the profile change came from elsewhere and this rule shouldn't run.

Here are the logs of it failing. The "Event: VS Suppress Doorbell switch off" at 19:00 was me manually turning the switch off. It should have turned off at 18:31:55, two minutes after the front door closed.

And here it is working a little later:

I've deleted and recreated this rule with exactly the same results.

The working example is with the wait event (door closed) occurring. Do you have a log where the wait timeout occurred (ie it went to the next action)?

There was an issue with timeouts found in this thread

1 Like

At 18:29:55 the app logs that it saw the door close, 11 seconds after the 'wait for event' was logged.

Thanks. Hopefully it's the same bug and will be fixed in the next update.

1 Like

I doubt that these are related to the bug found with Wait for Expression with Between Two Times. I'll take a look at timeout for Wait for Event.

Yeah, frustrating results for sure. I don't have any idea why it would behave the way you show above, or why it would work sometimes and not others. Software bugs are reproducible, not generally of a flavor where they do different things for no observable reason (speaking for apps anyway).

Actually, it should have logged the "Action: Off" immediately, but didn't.

How often does this fail?

I'm not really sure, but it works much more often than it fails. I recently added a rule that notifies me when the virtual switch stays on for over 20 minutes, so I'm always noticing it now when it does happen. I agree, this would be a lot easier if it just never worked rather than failing occasionally.

Do you think this could just be excessive hub loading causing it to fail? I took a look at the logs and noticed when it failed, that log entry was right in the middle of a flurry of other log entries from another rule that had just triggered. I only have 1 second resolution in my external logs database so I can't see the millisecond timing of the events, but the database shows 18 log entries at 18:29:55.x

I misread your question the first time. I did find an example where the timeout expired and in that case the rule worked and scheduled the switch to turn off 2 minutes later.

1 Like

So itā€™s as if the delay sometimes doesnā€™t get scheduledā€¦?

@bravenel

For the record, I have also experienced this bug (rule simply stopping midway) in the following rule. In my case, the wait usually ends correctly and then it stalls after the repeat ends:

You can see that after the wait-for-expression ends the repeat, it should send me a message saying that the door was automatically locked, however about 25% of the time I never get the notification. In logs it will show that the wait event happened and the repeat ends with a line of "Action: END-REP (waiting for next)". At that point, sometimes it continues and I get the message, or sometimes it stalls and and simply nothing else happens after that.

I don't currently have a log entry showing it failing as I cleared the logs recently testing that new reboot option. If I can get a screenshot of the log entries after a failure I'll post.

Always good to have company. The only thing worse than an intermittent failure is an intermittent failure that doesn't exist anywhere else. :slight_smile:

Yes, it logs that the event it is waiting for happened, but then it doesn't seem to go on to the next action. Unfortunately this is one of those cases where logs may be of minimal value as the problem is something that didn't happen, and things that don't happen aren't in the logs.

The failure described by @a.mcdear does not appear to be at all the same as yours. @a.mcdear must show logs to have any clue at all.

Bruce, first, I appreciate that you're here on a Sunday keeping up with this thread right after the C8 Pro (Max Ultra Plus :smirk: ) launched.

Do you think it would make any difference if I changed the 'wait for event' to 'wait for expression'?

I can't understand your current problem at all, can't see any circumstances in the code that would cause it, and I can't reproduce it. Wait for Expression does use slightly different code than Wait for Event.

@bravenel Luckily, since I cleared my logs, this rule attempted to run just once and failed. Here's that log:


The wait cleared successfully, the repeat ended, and then.......

As soon as I can capture a log of a successful completion, I will post that as well.

For reference, here's the rule again:

**edit - I just noticed another glitch when looking at the above screenshot of the rule. Well, maybe not a glitch, but an inconsistency... In the Trigger Events section, the variable "away_status" which is a number, is correctly represented as a number in the lower range value, but is represented as a decimal in the upper range value.

Does this always fail, or just sometimes? I'm not able to reproduce a failure, which makes it very difficult to figure out what is going on.

@bravenel Yeah it is intermittent :frowning:

Could there be anything in engineering logs that may help? If so I can PM you my hubs UID.

I'd say it fails 20-25% of the time. For awhile (during the 2.3.7 beta program) I thought it only failed on the first time the rule ran after a reboot, but I have since noticed it failing at other random times as well.

**edit - I just attempted to test it several times, and it looks like it seems to always fail the first time after it was "initialized". I did get it to complete on the most recent attempt to run it. Here are some logs:

@bravenel I have been able to repeat the rule stall condition on my C5. Any time I initialize the rule and then test run it, it stops after the repeat is ended. On subsequent runs of the rule, it works fine. But if I open the rule again and hit done to initialize it, it will always fail on the very next run.

Thanks, I will look a bit further at what might cause that.

There is actually something you could do for me to help find the issue. I need to see the entire Application State of the rule, both right after you update/initialize the rule, and then later after it has run successfully. Application State is on the App Status page (gear icon). This may entail multiple screenshots as there is a lot there. Whatever difference is causing the failure should show up there.

2 Likes