Wait for Expression timeout never occurs

user5985 · January 17, 2024, 8:10pm

Something may have happened, I know this worked before. I have a simple wait for expression with a timeout. If the expression occurs BEFORE the wait time, it wakes up. If the expression does not happen within the timeout, it gets stuck waiting. Then if in that state, triggering the expression to wake the wiat does not occur either. It remains stuck. Any ideas?

Here's the simple RULE:

Set AC-ACTION--Poll--DONE to false
Set boolWaitTimeoutOccured to false
Wait for Expression: Variable AC-ACTION--Poll--DONE(false) = true(F) [FALSE] 
 --> timeout: 0:00:30
IF (Variable AC-ACTION--Poll--DONE(false) = true(F) [FALSE]) THEN
	Notify Pushover Normal Alert: 'AC-ACTION--Poll--DONE was successful'
ELSE
	Notify Pushover Normal Alert: 'AC-ACTION--Poll--DONE failed (timeout occurred)'
	Set boolWaitTimeoutOccured to true
END-IF

Here's the LOG output:

dev:822024-01-17 02:54:11.340 PMinfoAC-PollDone was turned off
dev:822024-01-17 02:54:08.954 PMinfoAC-PollDone was turned on
dev:822024-01-17 02:54:06.595 PMinfoAC-PollDone was turned off
dev:822024-01-17 02:53:53.851 PMinfoAC-PollDone is on
dev:822024-01-17 02:53:09.959 PMinfoAC-PollDone was turned on
dev:822024-01-17 02:52:39.876 PMinfoAC-PollDone was turned off
dev:822024-01-17 02:52:38.408 PMinfoAC-PollDone was turned on
app:1182024-01-17 02:51:38.702 PMinfoAction: Wait for Expression: Variable AC-ACTION--Poll--DONE(false) = true(F) [FALSE] --> timeout: 0:00:30
app:1182024-01-17 02:51:38.585 PMinfoAction: Set boolWaitTimeoutOccured to false
app:1182024-01-17 02:51:38.443 PMinfoAction: Set AC-ACTION--Poll--DONE to false

pseudonym · January 17, 2024, 8:44pm

Not sure what's going on but I would delete and recreate the Wait.

user5985 · January 17, 2024, 9:11pm

Actually, this is a new rule I made for testing. This behavior happens for other rules I created, so I wanted to see if a simple rule would also behave this way, and it does. I rebooted using the option to "Rebuild database on reboot", but it did not fix it.

I'm thinking my hub has gotten into some bad state, because a few times a rule I was creating was so big it caused an "unexpected error" and I could no longer edit that rule. So I needed to revert to a backup. This may have been due to a low memory issue, I don't know. I split the rule into two and it seemed OK.

Will a soft reset be any different than using the reboot option to rebuild the database? Any other ideas of how to reset the wait states of the hub, since reboots don't fix it?

pseudonym · January 17, 2024, 10:24pm

Is the timeout actually getting scheduled?

user5985 · January 17, 2024, 11:21pm

I'm not sure where to find info on whether the timeout is being scheduled. In the logs I do see this error in "Hub events":

schedulerError Scheduler error: Failure occured during job recovery.

pseudonym · January 18, 2024, 2:46am

In the upper right corner of the rule you'll see the gear icon

Select it and at the bottom you'll find Scheduled Jobs. Here is an example of how a scheduled timeout appears

Use the rule's Run Actions and see if the timeout is actually getting scheduled.

hubitrep · January 18, 2024, 2:47am

See this thread None of the basic rules apps working - they use to, but not anymore - #16 by bahree

Although the simplest workaround may be to recreate the rule.

user5985 · January 18, 2024, 4:39pm

Thanks everyone. The comments had me going back to search for errors around the time I thought the scheduler stopped working. Because it turns out that ALL timers stopped working. That is, rules that had nothing to do with my new rules were no longer working, like those that were triggered by sunset, specific time of day, etc.

So here is the error I noticed:
errororg.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'doRepeatR' and group: 'app138Once', because one already exists with this identification. on line 9287 (method allHandler)

It was generated by a WatchDog timer repeating loop I had created a while ago that saved the current date-time every minute, so that when the Hub initialized I would send a notification which included this last-known-timestamp. For some reason, that rule threw this error and was the cause for preventing any other timer to be scheduled, I guess.

After I deleted the rule and then deleted the job ID using "deleteAppJobs" (which succeeded), then rebooted, everything worked like it should.

Thanks.

user5985 · January 18, 2024, 9:27pm

Question is, why did this happen, and why wouldn't a reboot simply clear any app job that was stuck? Do jobs stay in a state and not cleared on reboot?