RM stuck on wait action again

I have a short RM 4.0 rule with a "wait for event" action. It got stuck on that wait or just dropped the wait altogether even though the event occurred. The rule:

I can rewrite this rule to avoid waits if they're unreliable, but that's not the point. First, here are full logs from just that rule, ending with the stuck wait:

However, the contact was opened long before I took that screenshot. Here's the end of that time in context, with the contact-open event at the bottom:

That event triggered other things to happen (like the fan turning off and the hall lights turning on), but nothing at all from the rule in question. No other rules refer to this rule to cancel its actions, etc.

This problem is almost identical to the issue I described with all logging enabled here:

In both cases, different rules' action logging shows execution of a wait, then events that should have satisfied the waits' conditions, then nothing. It's as if waits are sometimes dropped or not being registered.

I'm on v2.2.2.129.

I don't understand why you are using a wait for event. Can't you just trigger the rule based on the door opening? Every opening must have been proceeded by a closing. That's how binary sensors work. They're either open or closed. So, I'm not 100% sure what having the wait for event actually gains you in this instance. If you just had the conditional action of turning the light off if motion was active triggered by the door opening, wouldn't you get the same outcome?

Showing a log ending with a "stuck" wait isn't helpful, as we need to see the subsequent events. Can you possibly do these logs on a computer instead of a mobile device? Notice in the first logs you posted, that the wait completed as expected earlier. Also, when you think you have a stuck wait, look at the App Status page at its Event Subscriptions, and see if the event for the wait is still subscribed to. Please bear in mind that a subsequent triggering of the rule will cancel a pending wait.

Why do you have a delay before the wait, that is cancelled when the rule is triggered? What happens (in your mind) if the door opens during that 5 seconds? That would guarantee the wait to fail.

Consider using Wait for Condition instead, in that it would 'fire' if the door had already opened. It's basically the same as Wait for Event, except that it tests the condition at the outset of the Wait, and doesn't wait if it is true.

I have all the logs between door closed/rule triggering to door opening again. It's long, but easily partitioned: door closed and every app activity is right at the beginning, and the door opening is right at the end, with nothing related to either one in between, and no intervening errors or abnormal things. Here are those logs, with relevant lines highlighted in red.

Yes, it worked earlier, and has always worked (for months). I just captured logs this time it didn't and couldn't be explained by anything I understand, despite having all logging enabled. It seems relevant because of my previous troubles with waits when logs indicated that the conditions were met (the post linked above).

Intended behavior: turn the lights off when the door opens if it had been closed for at least 5 seconds. (Except an edge case: don't turn the light off if there wasn't just motion in the bathroom, as that means the door is likely opening because someone is entering rather than exiting.) Purpose of wait: debouncing when the door closes and "debouncing" human behavior (enter bathroom, close door, say, "shoot, I forgot my towel" and immediately reopen the door--we've found that usually means we don't want the lights off, and should therefore just be treated as a blip). I don't want to use a condition because I have different logic that turns the light off in the "door is already open" case because we want the lights to stay on in the bathroom if we're brushing our teeth with the door open.

But all that aside, I don't need to get into intent or rule style, but rather a specific issue I've observed with wait actions that others may be experiencing as well.

I only want the lights to go off if the door had been closed for at least 5 seconds before opening.

You could do the same thing with a local variable and have the rule triggered by a change in the door status. Like this:

Trigger:  Contact Changed

Action:
If door closed Then
     Set variable to false
     delay actions for 5 seconds cancelable
     Set variable to true
Else If door is Open then
     Cancel delayed actions
     If variable is true then
        Turn off lights
    End-If
End-If

And that would work without using a wait action. Which is more reliable or "better", I can't speak to but it is definitely possible to not have a wait.

In the logs you posted, at 05:24:16.228 the Wait is over, and then shortly thereafter, in red, the door opens. But, the rule is not waiting at that point and the door hasn't closed, and the rule triggered.

The action at that timestamp is a wait for another unrelated rule ending. Nothing to do with the bathroom wait.

Please post logs as screenshots here, and identify the app and device in question.

The "Bathroom 1 / Lights off when leave" rule (very first screenshot) is app:202. The "Bathroom 1 / Door contact" it triggers on and waits on is dev:100. All log entries from both of those are highlighted in red in the linked logs, which are a direct copy/paste of all the logs during that ~30-min time window.

You want me to go to that link and take screenshots page by page? It would be 10-15 pages and would not be searchable.

I would use Simple Automation Rules for this instead of RM4 since it's a straight forward relationship between the sensor and the light. Use less resources

Just picked random lights in mine, but this is from Simple Automation Rules

No. I'm trying to help, but there are things I need as I have limited time. One is screenshots of the germane portion of logs, not copy/paste. Two is the problem isolated and shown. The problem doesn't span 10 or 15 pages. You can isolate the logs to the specific app in question using the filters at the top of the Logs page. You can pull device events from the Events button at the top of the device page. If you can reproduce the result, then this is the way to demonstrate the failure. It would also be useful, if you can demonstrate a failure, to show the Event Subscriptions portion of the App Status page for the app after the failure.

I posted a screenshot of app:202's full log history in the first post. You said it was unhelpful to show just a "stuck" wait, but that was the last line reported from app:202--there's nothing else in the logs that can provide insight into why. For completeness, here's that screenshot again, for all app:202's lines (sorry, the logs have rolled off the history, so this mobile screenshot is all I've got):

And here are the only relevant dev:100 actions from the device history:

That's everything. The copy/pasted logs only serve to show that nothing else interfered (system issues, exceptions). At the very bottom are a few highlighted lines, and then the one at the top. If I delete all the rest of the lines, here's what it looks like:

I'm not asking for help with this rule, but rather I'm reporting a bug or error in the HE stack with wait actions. I've collected clear evidence that the wait command sometimes does not work both here and in the other post (linked from the top of this thread).

I can't tell if I'm trying to convince you that there's an HE bug, or if I'm trying to provide you with enough information to debug it. There's not enough logging available for the latter, even with all logging turned on. I've done my due diligence to show that something is broken, and that with all logging enabled, there's not a enough to determine why, so it's out of my hands to provide sufficient information to debug it.

If it happens again, I'll try to get a real-time screenshot of the rule's subscriptions.

Thanks. Can you reproduce this?

Does this rule ever work as expected?

Unfortunately no, I've been unable to reproduce this one. I did reproduce the one in the other thread the following day, which is why I was able to collect the full logs in the linked post. It seemed like the same kind of "dropped wait or not registered subscription" issue. It's unpredictable and fairly rare.

I've since rewritten that other rule to avoid the wait, but I'll post back here with more info if this one occurs again.

This is the reason I'd like to see the Event Subscriptions after the failure.

On its surface, it is unlikely to be an app bug, simply because an app is going to fo the same thing with the same stimulus every time. There have been not been a bunch of reported unexplained Wait failures, and the app tests as expected. An app bug would be reproducible.

That, for the moment, leaves this as an unexplained anomaly. No comfort in that!

I'm going to add more optional logging to Rule 4.0, to show events that hit the rule for Waits.