"Wait for expression" failure again - trying a different workaround

Alan_F · March 19, 2024, 11:49pm

In reference to what I posted in the now-closed thread here: Multi Hub Task Splitting - #30 by Alan_F

The rule failed again despite moving some of the logic to a helper rule and a hub variable. I think I'm giving up on 'Wait for expression' in any rule that has a required expression that can go false.

Unfortunately I only had Action logging on for the rules, as I turned off the event and trigger logging when I set up the helper rule, so I'm expecting this will again be dismissed without investigation, but just in case, here are the logs I have:

The primary rule is app 18. The helper rule is app 52.

The older entries at the bottom show the rules working correctly last night. At 9:01 PM the primary rule began to wait for the hub variable to change. At 12:04 AM the other rule set the hub variable to false and the primary rule logged the "Wait for expression over" before finishing its actions.

The primary rule triggered again at 6:22 PM. At 7:00 PM the helper rule set the variable to false. The primary rule didn't react.

Primary rule:

Helper rule:

I think the only approach left (other than moving this to Node Red) is to use 3 rules... one to take the initial action (the first part of my current main rule), one to wait for the conditions (my current helper rule) and a third one to take the action(s) that need to happen after the "wait" is over.

The new main rule will be the first part of the current one, deleting the 'wait for expression' and the final action to reset the Powerwall reserve setting.

The helper rule will stay the same. It will trigger once the 'wait for' conditions are met and set the hub variable back to false

The new third rule will trigger when the hub variable changes from true to false. It will set the Powerwall reserve level back to correct level.

While using 3 rules instead of 1 seems a bit cumbersome, I think I prefer it to the repeating loop workaround proposed by @vitaliy_kh

vitaliy_kh · March 20, 2024, 12:27am

Of course choice is purely yours but few rules I modified with the repeat loop now seems to be stable. My stress simulation rule did not fail. But I will wait a bit longer before I'll update many other rules the same way.

hubitrep · March 20, 2024, 12:51am

Of course 1) I have no idea what the rest of your rules look like so this advice may not work for you, and 2) you may feel this suggestion totally misses the point you're trying to make, but looking just at what you showed here, it seems you could avoid the wait altogether by having the first rule put your powerwall in "grid" mode and exit, and the other rule set it back to normal (the second dim command would be in that rule). You already have a hub variable holding the "grid" state, it could be used by the second rule as part of the required expression.

I use this pattern extensively with good success and I struggle to remember if I have any waits at all in the dozens of rules I've written. YMMV.

BTW I notice your first rule isn't protected against re-triggering (wouldn't matter for the second). Not sure that's the problem because I think you would see additional action logs if it was, but having triggers and events logged as well as actions would help rule it out.

Alan_F · March 20, 2024, 1:15am

Thanks @hubitrep ... sometimes building one thing on another leads to needless complexity. I think you're right that there is no need for the third rule and I can just have the second/helper rule complete the actions. I have to think a bit on whether I still want to use the hub variable or whether I should use the private boolean on the second rule to keep it dormant until the first rule fires.

The first rule is protected from re-triggering, because its required expression includes that the dimmer level of the Powerwall <> 100 and its actions include setting that dimmer level to 100. So every time it runs, it falsifies its own required expression. If it was working correctly, it would set that dimmer level to a lower value at the end of the rule, allowing it to run again in the future. Now it will depend on the second rule to take that action.

GuyMan · March 20, 2024, 2:24pm

I'm very much in favor of Hubitrep's approach, as that's what I would do to eliminate the internal WAIT in your primary rule as well.

And I would keep the global/HV, and I would also add a PB on the 1st rule to avoid retriggering - I understand the dimmer level is in the required-expression, but again, I've seen time lags on IO out to physical devices (performed async to the rule flow), so if you OK with the 1st rule triggering once or twice, and that doesn't cause problems, then that's OK - But I wouldn't trust the change in the dimmer level to happen immediately (depends on network protocol, but likely 40-100ms to complete), versus the change to PB/HV seems to happen just in memory and the DB and seems to excute synchronous to the code (aka, you can trust they atomically executed, before the the next line of RM is executed). I find it's fairly easy to get tripped by by race conditions in RM, and defensively code around that.

Bottom line, I trust HVs/PBs to happen much more quickly and avoid some of those RM race conditions given the event & asynch nature of hub. - My "rules (pun intended) of thumb", avoid waits within rules, and trust internal memory resident variables, over device IO (or even virtual devices), when timing matters, keep rules short, and non-blocking.

YMMV, but in summary, I would go with the approach @hubitrep suggests, of two seperate rules, synchonized with a global HV, and each rule seperate controls the dimmer level as appropate.

This general design approach has worked well for me (and sounds like it works for @hubirep as well), and has increased reliability signficantly (as well as WAF) - Also, IMHO, naming conventions are key to coordiate this, or it gets messy to track

Best of luck!

PunchCardPgmr · March 20, 2024, 3:36pm

Hummm, now how to consolidate all these gems for easy discovery before folks get knee deep into using the constructs that are bound to cause problems eventually?

The Tips, Tricks, and no Trips in Hubitatland
or
The Rulemaker's Fall-off-a-Cliff Notes
or the
How to keep tripped rules from tripping over themselves video

Alan_F · March 20, 2024, 5:55pm

I ended up going with the PB instead of the hub variable just to reduce the number of moving parts. This way everything is contained in the two rules within RM. The first rule sets the PB on the second rule to true. When the rest of the conditions are met, the second rule triggers, sets the dimmer back, and sets its own PB to false. The second rule is then inactive until the first rule sets its PB to true again.

You make some very good points about the speed of the dimmer updating vs a variable, but I don't think it matters in this case. There is no harm to the first rule triggering twice. In the edge case of the home power use going above 5 kw a few milliseconds before the dryer reports "drying", the rule will trigger twice, set the PB of the second rule to true twice, and set the dimmer to 100 twice.

Plus, as I found out elsewhere in RM, even using a PB or hub variable isn't a 100% guarantee against multiple triggers. If multiple triggers go off at essentially the same time, the PB won't be set fast enough and other debouncing measures have to be taken like HubitatPublic/example-apps/debounceContact.groovy at master · hubitat/HubitatPublic · GitHub. I had to use that in a case where all 4 of my tire sensors will report in within milliseconds of each other, and when the temperature changes, all four may be reporting low at once.

vitaliy_kh · March 20, 2024, 6:09pm

Basically this is "State Machine" design approach and will create a very solid behavior.
Unfortunately designing State Machine with RM requires many dedicated rules.
Basically one rule for each State.

PS.
I am EE and I am using State Machines nearly for all logic design.

a.mcdear · March 20, 2024, 6:16pm

I have had great success making extensive use of waits within rules, mostly for action confirmation purposes.

For example, I send a command to lock a lock. I had found that sometimes (very seldom) the command would not be received and the lock would not lock, so I started adding confirmation steps. Basically, after the action to lock the lock, I add a wait-for-expression showing that the lock has locked. Once the lock reports locked, the wait ends and the rule exits. Generally, all of this only takes a few milliseconds to happen.. but if there's no confirmation from the lock within 30 seconds, it will attempt to lock it again and then wait for confirmation again until success is achieved.

vitaliy_kh · March 20, 2024, 6:33pm

Of course this is very valuable option but with one BIG IF : when it works.

a.mcdear · March 20, 2024, 6:34pm

Once I figured out what I was doing, I've never had it not work...

vitaliy_kh · March 20, 2024, 6:36pm

And what is a magic trick and/or secret?

GuyMan · March 20, 2024, 6:38pm

Agreed - It's not perfect, and if you dealing with single digit ms triggers, things can still go astray. - Having "real" semaphores, mutexes, or even a distributed lock manager for synchronization across multiple hubs, would be a nice thing - Not that I ever expect to see any of those happening.

I've said before, there is a reason, avionics and hard real-time stuff is developed on VxWorks, and not HE..

GuyMan · March 20, 2024, 6:46pm

Generally yes - I'm not religious about this stuff, and I leave logging on for a few weeks on new rules to see if I'll have issues - But yeah, I have hundreds of small rule snippets - and my experience is that things are more reliable with this approach (of course, YMMV - I also prefer Matter over local WiFi, versus Z-Wave, but that's another story for another thread)

To help manage "lots of small rules", I'm a great fan of: [Initial Release] Rule Machine Manager (New Rule Machine Interface) - combined with common naming conventions across devices, rules, and global hub variables, to know what's used where

Not saying my way is the best, or even better, than others - Just relaying what's worked for me over several years to work around limited concurrency controls, given HE's asynch behaviors.

GuyMan · March 20, 2024, 6:52pm

My experience is that short and simple wait expressions generally work - The longer the wait, and more complex the expression and items being "WAITED" on, the less reliable things become (IMHO). - I definitely don't like "long" WAITS (it's all relative), due to possible "interrupted" states over a reboot or update. - So I do have some WAITs in rules, but I generally go with the "two-rule" state-based sort of approach as preference, just based on where I've had pain points in the past (that others seem to be having as well).

So again, all my advice is YMMV, and "do what works reliably for you"...

vitaliy_kh · March 20, 2024, 7:36pm

For me it looks like Wait for Expression which waits just for a single device state change works OK. For the single device Wait for Expression basically transforms into Wait for Event and this works very well. I cannot speak for the Wait duration but definitely more complex logic is a problem. My good guess, each involved device/variable needs to generate an Event first and than on each Event logic will be evaluated. Surprisingly when rule was failing on WAIT the related Expression always was TRUE but sleeping rule did not wake up.

Alan_F · March 20, 2024, 8:30pm

In my experience, simple is more important than short. The other rule that I need to fix was waiting for a door to close after the door opened and it generally happened within 10 seconds. However, both rules had multi-part required expressions, and both rules performed an action that made their own required expression false. I suspect that the intermittent issue has more to do with the required expression than anything else.

GuyMan · March 20, 2024, 8:45pm

While I generally agree with this, in terms of "missed waits" that never get "completed" - Complex expressions are the problem there -

LONG waits (aka >8hrs) have been problematic for me in the past due to some reboot occuring during the "WAIT", and the "back half" of the rule, then never getting fired after the hub restart -

You can't easily preserve the application state of "half-way" in the rule, back to the WAIT on system restart, if the initial triggers of the rule have changed. Having the "state" presevered in HV or device status, makes recovery from hub restarts simplier/easier, IMHO. Obviously, any state transitions that occur while the hub is down are missed in all cases - but there is a "newer" switch to re-evaluate required expressions on reboot, which makes recovery a bit cleaner in those cases.

My hub reboots "automagically", over lower memory conditions (loosely every 3-4 weeks) via a nightly rule and I travel alot for work, so having morning routines fail, for the better half, every few months, while I'm out of town is never a well recieved - Hence my avoidance of "long" WAITS.

Given the specific use-case, this doesn't sound like a problem for you, but just wanted to pass the potential issue along (depends on the importance of the rule, the duration of the wait, and how often your hub is restarted)

a.mcdear · March 20, 2024, 9:04pm

I can post some examples of you'd like..

Many of my rules are quite complicated with multiple waits, repeats, and conditionals and they all work all the time..

But mostly, the trick for me, was extensively monitoring logs to discover why waits were failing. (The biggest reason turned out to be me - by opening and the rule and clicking "done" when reviewing rule actions and things, which re-initializes the rule that may have already started, messing up PBs and other things that require the rule to run to its completion..)

joshlobe · March 20, 2024, 9:30pm

Thank you very much for the recognition. Please do let me know if you have any suggestions for improvements. I'm always welcoming new ideas.