Diagnosing Full Event Queue

Over the last few weeks I've been getting the dreaded "com.hubitat.app.exception.LimitExceededException: Event Queue is Full" message in my log. It happens every day or two, and causes havoc with automations. It also appears to be the cause of a cloud connection drop which takes a reboot to solve. This is my main "server" hub that receives all ~350 devices for from four other "radio" hubs for the purpose of (1) dashboards; and (2) any automation that requires cross-hub logic.

What is the best way to find the source of this problem?

Things I've tried unsuccessfully:

  • looking in the device logs, but I think I understand they don't show events unless they change device state. Nothing of note there.
  • investigating power devices -- but eliminated them as a cause by removing them (problem kept happening)
  • looking in the main log for clues, but either (a) most of my devices have logging turned off, in which case there's limited diagnostic info; or (b) most of my devices have logging turned on in which case I never can see more than a couple hours of history
  • reached out to Hubitat support, but they say they can't help because I have custom code. Ironically, I keep all my custom code on a different hub. The only thing that's "custom" (I guess) on this hub is Hubconnect and TileMaster. I guess HPM is custom too.

Appreciate any ideas. Cuz I'm out of 'em.

You would need to turn logging on for suspected (yeh, which are those?) devices. Even though it's not being saved in the log, the events still happen in the background.

It should be easy to see which one is flooding the logs once you turn the right one on.

Yeah, maybe I need to just suck it up, but the prospects of turning on logging for 300-400 devices is given me the shakes. And, per the above, when I do have lots of logging turned on, I only get like an hour of history in the log. So, by the time the event queue gets full, it's too late for me to see what happened. Maddening.

Are they all using the same custom code? You may need to just turn on one to see if it's flooding the logs.

If you are patient, wait until 2.2.4 comes out. There are supposed to be some major changes in that update.

You could also try hub stats and see if anything stands out. Lewis has some helpful posts about this.

Definitely not. Of the ~350 devices, the vast vast majority are just standard devices or their Hubconnect virtual counterparts.

I can definitely be patient because I'm really struggling with this. I'll check out @lewis.heidrick's hub stats as well. Thanks for the pointer.

The event que full usually comes from rules that trigger a ton of crap at once. Once it's borked up sometimes it doesn't recover without a reboot. Well in earlier versions at least. I don't know all the changes under the hood in 2.2.4.132. A failed pair, weak mesh, or several other things can cause it too. Post a screenshot of all your devices in settings, zwave details and I'll see if I can help.

1 Like

That's the thing. I don't have any zwave devices on the hub. I use the hub in question as a server hub to gather devices from 4 other "client" hubs--one for cloud and custom apps, the other 3 for zwave and zigbee radio devices. The server then does (1) dashboards; and (2) the subset of automations that require devices from multiple hubs to process.

This hub is the only one I have the problem with.

I've seen dashboards do it when you don't have them restricted to specific devices per dashboard.

1 Like

A great idea. Unfortunately (or fortunately), I discovered and corrected that problem quite a while ago.

Can you post a screenshot of your server hub stats?

Here it is. (and TIA)

Summary

Device Stats enabled: true
Device stats start time: 1604726345478
Device stats total run time: 436443
device id 5682 runcount 487 total runtime 58382 average run time 119.8809034908
device id 2161 runcount 1642 total runtime 137959 average run time 84.0188794153
device id 5681 runcount 71 total runtime 11806 average run time 166.2816901408
device id 878 runcount 157 total runtime 5704 average run time 36.3312101911
device id 5582 runcount 127 total runtime 5196 average run time 40.9133858268
device id 390 runcount 5 total runtime 189 average run time 37.8
device id 1235 runcount 127 total runtime 5847 average run time 46.0393700787
device id 4075 runcount 5 total runtime 119 average run time 23.8
device id 393 runcount 138 total runtime 5185 average run time 37.5724637681
device id 5683 runcount 10 total runtime 401 average run time 40.1
device id 5774 runcount 7 total runtime 741 average run time 105.8571428571
device id 1449 runcount 14 total runtime 375 average run time 26.7857142857
device id 5775 runcount 7 total runtime 1118 average run time 159.7142857143
device id 1227 runcount 2 total runtime 1025 average run time 512.5
device id 4140 runcount 1 total runtime 58 average run time 58
device id 391 runcount 32 total runtime 1459 average run time 45.59375
device id 368 runcount 9 total runtime 364 average run time 40.4444444444
device id 4749 runcount 2 total runtime 15 average run time 7.5
device id 1776 runcount 1 total runtime 52 average run time 52
device id 2527 runcount 1 total runtime 14 average run time 14
device id 5581 runcount 3 total runtime 185 average run time 61.6666666667
device id 4754 runcount 1 total runtime 3 average run time 3
device id 5553 runcount 6 total runtime 185 average run time 30.8333333333
device id 4460 runcount 4 total runtime 180 average run time 45
device id 4526 runcount 1 total runtime 27 average run time 27
device id 4719 runcount 1 total runtime 22 average run time 22
device id 595 runcount 7 total runtime 1669 average run time 238.4285714286
device id 389 runcount 3 total runtime 176 average run time 58.6666666667
device id 2089 runcount 1 total runtime 11 average run time 11
device id 364 runcount 1 total runtime 95 average run time 95
device id 365 runcount 1 total runtime 46 average run time 46
device id 385 runcount 1 total runtime 27 average run time 27
device id 386 runcount 1 total runtime 14 average run time 14
device id 387 runcount 1 total runtime 25 average run time 25
device id 399 runcount 1 total runtime 80 average run time 80
device id 400 runcount 1 total runtime 74 average run time 74
device id 401 runcount 1 total runtime 44 average run time 44
device id 388 runcount 1 total runtime 49 average run time 49
device id 4462 runcount 2 total runtime 120 average run time 60
device id 392 runcount 1 total runtime 63 average run time 63
device id 394 runcount 1 total runtime 385 average run time 385
device id 395 runcount 1 total runtime 44 average run time 44
device id 396 runcount 1 total runtime 47 average run time 47
device id 397 runcount 1 total runtime 23 average run time 23
device id 398 runcount 1 total runtime 459 average run time 459
device id 370 runcount 1 total runtime 20 average run time 20
device id 371 runcount 1 total runtime 58 average run time 58
device id 380 runcount 1 total runtime 63 average run time 63
device id 381 runcount 1 total runtime 4 average run time 4
device id 382 runcount 1 total runtime 24 average run time 24
device id 383 runcount 1 total runtime 69 average run time 69
device id 384 runcount 1 total runtime 43 average run time 43
App Stats enabled: true
App stats start time: 1604726345479
App stats total run time: 436454
app id 11 runcount 325 total runtime 138320 average run time 425.6
app id 1609 runcount 160 total runtime 21840 average run time 136.5
app id 1576 runcount 200 total runtime 36458 average run time 182.29
app id 1670 runcount 365 total runtime 62886 average run time 172.2904109589
app id 2148 runcount 74 total runtime 9271 average run time 125.2837837838
app id 172 runcount 309 total runtime 36870 average run time 119.3203883495
app id 2788 runcount 157 total runtime 137061 average run time 873
app id 1069 runcount 148 total runtime 189757 average run time 1282.1418918919
app id 4133 runcount 127 total runtime 5420940 average run time 42684.5669291339
app id 225 runcount 366 total runtime 44075 average run time 120.4234972678
app id 2116 runcount 2 total runtime 608 average run time 304
app id 4314 runcount 16 total runtime 1012 average run time 63.25
app id 1829 runcount 2 total runtime 758 average run time 379
app id 3209 runcount 1 total runtime 519 average run time 519
app id 222 runcount 36 total runtime 4350 average run time 120.8333333333
app id 3557 runcount 3 total runtime 726 average run time 242
app id 425 runcount 16 total runtime 2336 average run time 146
app id 221 runcount 1 total runtime 82 average run time 82
app id 1452 runcount 1 total runtime 373 average run time 373
app id 2034 runcount 2 total runtime 408 average run time 204
app id 4132 runcount 3 total runtime 99155 average run time 33051.6666666667
app id 3876 runcount 5 total runtime 1200 average run time 240
app id 3731 runcount 5 total runtime 8050 average run time 1610
app id 3558 runcount 1 total runtime 24565 average run time 24565
app id 226 runcount 16 total runtime 3274 average run time 204.625
app id 224 runcount 6 total runtime 772 average run time 128.6666666667
app id 223 runcount 20 total runtime 2780 average run time 139
app id 227 runcount 12 total runtime 1199 average run time 99.9166666667
app id 4101 runcount 6 total runtime 191408 average run time 31901.3333333333
app id 2038 runcount 1 total runtime 129 average run time 129
app id 1473 runcount 1 total runtime 110 average run time 110
app id 1475 runcount 1 total runtime 533 average run time 533
app id 3333 runcount 4 total runtime 144257 average run time 36064.25
app id 3397 runcount 1 total runtime 42685 average run time 42685
app id 3525 runcount 1 total runtime 46431 average run time 46431
app id 391 runcount 1 total runtime 1788 average run time 1788
app id 5 runcount 3 total runtime 657 average run time 219
app id 3335 runcount 2 total runtime 52908 average run time 26454

Edit: How does the hub stats process determine which hub to evaluate since the hub IP isn't in the URL?

What are those :arrow_up: , they sure look suspicious to me.

app id 11 runcount 325 total runtime 138320 average run time 425.6
app id 4133 runcount 127 total runtime 5420940 average run time 42684.5669291339
app id 4132 runcount 3 total runtime 99155 average run time 33051.6666666667

:open_mouth:

Lots of very slow apps in there.

Yeah I had the same reaction. Those apps are all Tile Master 2 child apps--an awesome app from @bptworld that allows you to combine multiple devices on a single dashboard tile. The weird thing is that I build those combo tiles awhile ago and barely ever touch them. They've always worked great on the dashboard--and they're quick. My event queue problems just started happening in the last couple weeks.

A couple of things stand out to me. First being the total runtime of all your big apps. This hub is busy as hell. Second being the total runcount of some of your devices for the pretty short collection window. I'd look through all your high average runtimes as well and see if there is maybe a better way to do it.

1 Like

I disabled the offensive apps, and I haven't gotten the event queue error for the last twenty-four hours. However, the hub still dropped the cloud connection (specifically, it shows as unavailable from the mobile app and doesn't appear in the HE portal. I went through the logs and saw one error that I don't recognize:

java.lang.Exception: Unable to aquire (sic) lock to schedule run once for delayedActCancel on line 6918.

It's for a RM4 app that monitors a dozen motion sensors to set a virtual motion for "all things quiet" in the house. One of the first apps I set up a couple years ago. Has always worked without an issue.

Something weird is going on. I don't know what to do to get Hubitat support to help, but I'm dead in the water.

Tagging in @gopher.ny I can't zero in on it and my daughter is having surgery tomorrow.

1 Like

what is device 1621 because 1600 executions in that short time frame seems like alot. it seems chatty and that many runs could be filling up your queue.

A good place to look. Device 2161 is a hubconnect remote server, so I imagine that reflect the combination of all ~25 devices coming from (or maybe it's going to?) that hub. I'm going to analyze all the chatty devices and apps tomorrow.

It's hard to manage an event ceiling when I have no idea what that ceiling is. Is there some way to know what the max is? I wonder if the new hub mesh service that was discussed on the recent Hubitat Live would essentially eliminate these "events" as there only purpose is to keep the shadow device in sync with the physical one.