Another Wait for Expression RM failure

Here is rule in question:


Wait for expression statement became TRUE but rule stopped and did not finish:

This rule was working very well for a long time but recently I modified it a bit by adding
Private Boolean False/True around few actions. This modification was done to prevent
multiple re-triggering while somebody gets into the bathroom (4 Motion Sensors and one
mmWave sensor is involved). After this modification this rule started to fail (want not finish)
randomly. I actually observed and reported this behavior few times in a past. Many if not
all rules which are failing this way (not finishing) are using Private Boolean for preventing
multiple re-triggering. Problem is very random and very hard to catch.

UPDATE
Disabling use of Private Boolean made this old rule again stable:

@bravenel It makes no sense but something is going on.
This is not a single rule which OCASIONALY fails to complete.
The commonality is: Private Boolean is used to prevent multi-triggering.

What release are you on?

Right now C8 Pro 2.3.146
But I observed (and actually reported) this phenomena long time ago on C8 (and maybe even C7)
with different releases. Another words, this behavior is not new but since it is not very consistent it is very hard to catch and debug. I have multiple rules exhibiting the same behavior (i.e. rule fails to complete). Previous rules are far more complicated and stopped at a random points. This one is very simple. It was very stable foe a very long time. It became unstable as soon as I introduced Private Boolean to prevent multiple re-trigering. I thought, this should be an improvement but instead it introduced instability. Once I disable setting Private Boolean to false at the beginning (the original rule did not have it at all) rule became rock solid as it was before the modification.

Please let me know if you need some other info, etc.

That's not the full release number. Do you mean 2.3.7.146? If so, there is a change in 2.3.8 (now in beta) that may fix this.

2 Likes

Ops, I am sorry. Yes, of course my current setup is C8 Pro with 2.3.7.146 (+ C7 + HA).
So, is this something somewhat known problem?
OK, I will wait for the 2.3.8.x release (I am sorry but I don't want to sign for beta and test it).

Here is one more relatively simple rule which occasionally fails to finish:

Rule also has "Private Boolean" involved but instead of "Wait for Expression" it uses "Wait for Event" statements. I hope, next release will fix this rather very annoying hard to catch and debug problem. Also I wonder (if this is not a secret) what is root cause for this phenomena?
It looks like "Private Boolean" when set to false and used in the Required Expression somehow occasionally interference with the "Wait for Expression/Event" statements.

If this is not reliably reproducible, then I have no clue what the cause is. So don't get your hopes up for a fix. I can't even be sure that this is an RM bug if it can't be reproduced.

When rule fails to finish the last log entry is "Wait for Expression/Event" and that is it.
The failure is random but all faing rules have Private Bolean = True as a Required Expression which set to False at the beginning and reset to True at the end.
I am near positive, every Expression is evaluated based on Events instead of actual States.
This is OK if the Events are not missing. Now my nothing more than educated guess/theory:
Since Private Boolen used in the Required Expression suppose to kill a Trigger (ie Event) occasionally may kill an Event which triggers evaluation of the Wait for Expression statement.
And the result is - Event happened but was missed, Expression was not evaluated and rule simply stopped at this point.
Again, this is my theory based on many experiments and observations but it explains the observed behavior. Some sort of race conditions may explain why rule is failing randomly.

And yes, the problem could be outside the RM (for instance, race conditions inside Event Generator logic).

If this makes any sense please think about how to fix the problem, otherwise simply forget about it.

Yeah, this is something that makes no sense.

Not for Expressions. Events cause evaluation, evaluation of conditions is based upon states.

The logs don't lie. Is the event in the logs?

Well, being long time EE and dealing with many hard to catch problems I have very different opinion. If I saw a single failure it may not be true. However if failure happens randomly but more than once it is/was always thomething alarming and required attention.

The log definitely shows a Trigger Events.
I forgot was it an Event in a log for the Devices involved in the Expression .
Let me check this. But the rule never progresses beyond Wait for Expressions point when it fails to finish.

I'm speaking from a very specific point of view, the Rule 5.1 app. Software is a bit different than hardware in certain regards. There could well be as you suggested some race condition in the interactions of the hub, its events queue, caching, etc, that could cause such a failure. My ability to find the problem is more limited to what the app itself does. If there were a bug in Rule 5.1, it would be reliably reproducible. That is, I could take the example rule, create one like it, and get the same error, and you would get it every time, not randomly. My personal ability to find a deeper hub problem in this case is limited. But you have to present real evidence of a failure, not anecdote.

If there is not an event, then it's going to wait until there is one that causes the Wait Expression to be evaluated. You would need to show logs that show one of the events actually happened. Without an event, the Wait for Expression could well wait forever. The burden is on you to show that this is happening, and so far you have shown no evidence that there is an event that should have ended the Wait for Expression. Now, if you can show that there was a device event that didn't show up as an event logged by the app, that would be one thing, and possibly implicate certain aspects of the hub. In theory, any event that the rule is subscribed to will cause at least a log entry from the rule wrt to the event.

Also, there is no point in pursuing this any further until you are on 2.3.8.

OK, I will wait for the 2.3.8.x release.
But in my very first post on a top there is a log when only single trigger event was recorded
and rule was waiting forever for the Expression. Just in case here is a copy:

Basically seeing this failure of a very simple rule prompted me to start this discussion.
And btw, just disabling setting Private Boolean to False at the very first line made this rule rock solid (in the update to my initial post).

That's not showing any event for the Wait, just that it began the Wait. Where is some event to end the wait? Where is a motion inactive event?

Also, once the Wait starts, I need to see the Event Subscriptions then in effect. They won't show up until the Wait happens.

That was it. This is exactly a point. There was not any Inactive events logged in a rule itself.
Rule was simply waiting forever for the Events from all involved MSs.
And yes, all they are became Inactive at a time when I snaped a log (I was lazy to post an Events logs for each individual MS but please trust me, I did check the status for all of them).

I did not know about this requirement.
Next time if 2.3.8.x will not fix this problem I will provide the requested info.

Ok, let's stop for now until the release of 2.3.8.x

But were there inactive events logged by the devices after the wait? You can filter the logs to show the app and the relevant motion sensors.

I guess, right now it is too late to go through past app logs.
But after rule failed to finish I did check the logs for each MS from the Device pages to make sure all of them actually became Inactive. And yes, all MSs reported Inactive while rule still was waiting for the relevant Expression to become True. Another words, all MSs are Inactive and by this time the Expression must be True but rule things it is still False and as a result is waiting forever.

In the related thread that I recently started, I posted logs where my rule containing a wait for event with timeout logged the event that was being waited for but didn't then take the next action in the rule. The event was logged by the app itself, not just by the device.

After both 'wait for event' and 'wait for expression' had failures, I tried this, using 'time since event exceeds' instead of timeout:

I haven't had a failure with this syntax yet, but since it was intermittent I can't be sure that this solved it. (My rule doesn't use a private boolean, but does use a required expression)

These may all be unrelated issues, but it seems odd that several people are having intermittent failures when using 'Wait for' actions.

Yes, they are probably unrelated. Wait for Event and Wait for Expression are very different things, using very different code. If there is something common going on, I have yet to see what it is, and have yet to be shown logs that include the necessary information. Without those logs, there is not much that can be done. I'd be happy to investigate, but need more information.

I may not have seen all of your screenshots, but double-check that you didn't have this option set, because it will cause the behavior you're describing: