Test rule which proves "Wait for Expression" failure

For information only.
(I am not sure a discussion could be somewhat productive).

I created a very simple test rule on my C7 hub running latest platform 2.3.8.125 to prove the exitance of a problem with "Wait for Expression" statement. Most likely the problem is a "race condition" (or whatever else). Of course, this test rule is very stressful but creating a stress condition(s) is a whole reason for the intensive testing. Is not it?
Here is a test rule which fails near 100% of runs:

and here is a related log:

Only two sequential runs revealed a problem. As usually when rule fails the last log entry is "Wait for Expression" even Expression became TRUE.
Now the question is:
Is this problem will be addressed or simply ignored?
As of now my recommendation is to avoid using "Wait for Expression" statement because in real life sooner or later it will fail.

UPDATE

Here is a modified rule with "Wait for Event" statement relaced with Repeat Loop.

So far this modified rule/test runs without any problems.

Not discussing, just curious - If you add a short delay between the on and wait, does the rule work?

I wonder if this could be a classic test-and-set time-of-check to time-of-use problem, where the wait sees the switches off when first evaluating the expression but by the time the events are subscribed (for capturing the next state change) the on event(s) have already happened. You might be able to see this if you logged the switch devices along with the rule.

3 Likes

In this specific test - may be.
The reason for this test is to prove the existence of a problem. In the real life there is absolutely no any guaranty when independent external Events will happened and what time relationship between Events will be. In real rules there was/is a huge natural delay between these actions but once in while rule fails exactly the same way.

I can reproduce your results on my C7, @vitaliy_kh

The first run worked fine. On the second one, the two switches logged the command to on before the wait, but the wait did not see their state as on. I had to manually cycle the switches to off and then on to make the wait complete.

Of course. But sometimes showing the conditions under which the rule works, and then doesn't work, is useful to demonstrate the issue. Incidentally I tried it and a 100ms delay makes this specific rule "work", but it shouldn't be required, as you say.

The modified rule repeatedly tests the device attribute states and doesn't rely on event subscriptions, only a scheduled job (for the next rep), so if the underlying issue is time-of-check to time-of-use atomicity, it makes sense that your modified rule works.

This is a timing problem, a variation on a race condition. The Wait has to create event subscriptions and write state as the rule exits for the Wait, and the immediately preceding device on commands generate events that have to make it through the event queue. There is asynchrony happening, and non-deterministic relative timing.

The problem could be summarized as an involving close timing between device events and Waits for those events. I don't believe there is a code solution for this class of timing problem, at least not in RM. Possibly a critical state element for Wait could be moved to atomic state, although I'm not sure that would catch all cases of this problem.

I think it can be patched this way in RM (repeat interval would be application-specific)

1 Like

It would be better to rethink the overall rule design than to resort to polling, imo. These problems arise from rules that load many functions into a single rule, instead of multiple rules. The OP resists using multiple rules for unrelated reasons.

Topic closed.