This rule was working very well for a long time but recently I modified it a bit by adding
Private Boolean False/True around few actions. This modification was done to prevent
multiple re-triggering while somebody gets into the bathroom (4 Motion Sensors and one
mmWave sensor is involved). After this modification this rule started to fail (want not finish)
randomly. I actually observed and reported this behavior few times in a past. Many if not
all rules which are failing this way (not finishing) are using Private Boolean for preventing
multiple re-triggering. Problem is very random and very hard to catch.
UPDATE
Disabling use of Private Boolean made this old rule again stable:
@bravenel It makes no sense but something is going on.
This is not a single rule which OCASIONALY fails to complete.
The commonality is: Private Boolean is used to prevent multi-triggering.
Right now C8 Pro 2.3.146
But I observed (and actually reported) this phenomena long time ago on C8 (and maybe even C7)
with different releases. Another words, this behavior is not new but since it is not very consistent it is very hard to catch and debug. I have multiple rules exhibiting the same behavior (i.e. rule fails to complete). Previous rules are far more complicated and stopped at a random points. This one is very simple. It was very stable foe a very long time. It became unstable as soon as I introduced Private Boolean to prevent multiple re-trigering. I thought, this should be an improvement but instead it introduced instability. Once I disable setting Private Boolean to false at the beginning (the original rule did not have it at all) rule became rock solid as it was before the modification.
Please let me know if you need some other info, etc.
Ops, I am sorry. Yes, of course my current setup is C8 Pro with 2.3.7.146 (+ C7 + HA).
So, is this something somewhat known problem?
OK, I will wait for the 2.3.8.x release (I am sorry but I don't want to sign for beta and test it).
Rule also has "Private Boolean" involved but instead of "Wait for Expression" it uses "Wait for Event" statements. I hope, next release will fix this rather very annoying hard to catch and debug problem. Also I wonder (if this is not a secret) what is root cause for this phenomena?
It looks like "Private Boolean" when set to false and used in the Required Expression somehow occasionally interference with the "Wait for Expression/Event" statements.
If this is not reliably reproducible, then I have no clue what the cause is. So don't get your hopes up for a fix. I can't even be sure that this is an RM bug if it can't be reproduced.
When rule fails to finish the last log entry is "Wait for Expression/Event" and that is it.
The failure is random but all faing rules have Private Bolean = True as a Required Expression which set to False at the beginning and reset to True at the end.
I am near positive, every Expression is evaluated based on Events instead of actual States.
This is OK if the Events are not missing. Now my nothing more than educated guess/theory:
Since Private Boolen used in the Required Expression suppose to kill a Trigger (ie Event) occasionally may kill an Event which triggers evaluation of the Wait for Expression statement.
And the result is - Event happened but was missed, Expression was not evaluated and rule simply stopped at this point.
Again, this is my theory based on many experiments and observations but it explains the observed behavior. Some sort of race conditions may explain why rule is failing randomly.
And yes, the problem could be outside the RM (for instance, race conditions inside Event Generator logic).
If this makes any sense please think about how to fix the problem, otherwise simply forget about it.
Well, being long time EE and dealing with many hard to catch problems I have very different opinion. If I saw a single failure it may not be true. However if failure happens randomly but more than once it is/was always thomething alarming and required attention.
The log definitely shows a Trigger Events.
I forgot was it an Event in a log for the Devices involved in the Expression .
Let me check this. But the rule never progresses beyond Wait for Expressions point when it fails to finish.
I'm speaking from a very specific point of view, the Rule 5.1 app. Software is a bit different than hardware in certain regards. There could well be as you suggested some race condition in the interactions of the hub, its events queue, caching, etc, that could cause such a failure. My ability to find the problem is more limited to what the app itself does. If there were a bug in Rule 5.1, it would be reliably reproducible. That is, I could take the example rule, create one like it, and get the same error, and you would get it every time, not randomly. My personal ability to find a deeper hub problem in this case is limited. But you have to present real evidence of a failure, not anecdote.
If there is not an event, then it's going to wait until there is one that causes the Wait Expression to be evaluated. You would need to show logs that show one of the events actually happened. Without an event, the Wait for Expression could well wait forever. The burden is on you to show that this is happening, and so far you have shown no evidence that there is an event that should have ended the Wait for Expression. Now, if you can show that there was a device event that didn't show up as an event logged by the app, that would be one thing, and possibly implicate certain aspects of the hub. In theory, any event that the rule is subscribed to will cause at least a log entry from the rule wrt to the event.
OK, I will wait for the 2.3.8.x release.
But in my very first post on a top there is a log when only single trigger event was recorded
and rule was waiting forever for the Expression. Just in case here is a copy:
Basically seeing this failure of a very simple rule prompted me to start this discussion.
And btw, just disabling setting Private Boolean to False at the very first line made this rule rock solid (in the update to my initial post).
That was it. This is exactly a point. There was not any Inactive events logged in a rule itself.
Rule was simply waiting forever for the Events from all involved MSs.
And yes, all they are became Inactive at a time when I snaped a log (I was lazy to post an Events logs for each individual MS but please trust me, I did check the status for all of them).
I did not know about this requirement.
Next time if 2.3.8.x will not fix this problem I will provide the requested info.
Ok, let's stop for now until the release of 2.3.8.x
I guess, right now it is too late to go through past app logs.
But after rule failed to finish I did check the logs for each MS from the Device pages to make sure all of them actually became Inactive. And yes, all MSs reported Inactive while rule still was waiting for the relevant Expression to become True. Another words, all MSs are Inactive and by this time the Expression must be True but rule things it is still False and as a result is waiting forever.
In the related thread that I recently started, I posted logs where my rule containing a wait for event with timeout logged the event that was being waited for but didn't then take the next action in the rule. The event was logged by the app itself, not just by the device.
After both 'wait for event' and 'wait for expression' had failures, I tried this, using 'time since event exceeds' instead of timeout:
I haven't had a failure with this syntax yet, but since it was intermittent I can't be sure that this solved it. (My rule doesn't use a private boolean, but does use a required expression)
These may all be unrelated issues, but it seems odd that several people are having intermittent failures when using 'Wait for' actions.
Yes, they are probably unrelated. Wait for Event and Wait for Expression are very different things, using very different code. If there is something common going on, I have yet to see what it is, and have yet to be shown logs that include the necessary information. Without those logs, there is not much that can be done. I'd be happy to investigate, but need more information.