Hub stops working every 24 hours - still happening!

Hi everyone,

First post, and I'll apologize in advance for making it a long one!

I'm a recent Wink convert, purchased and initially set up my Hubitat in May 2020.

Here's my setup:
Hubitat Elevation® Platform Version 2.2.2.129, Hardware Version Rev C-5
~15 ZWave swtiches/Dimmers
1 Schlage lock
Castea Pro Bridge (w/ 3 dimmers + 3 Picos)

Apps I use (all native):
Amazon Echo Skill
Hubitat Dashboard
Lutron Integrator
Rule Machine 4.0 (7 rules)

Starting nearly two weeks ago, I noticed that my automation rules weren't activating every day (turning lights on in the evening). Then Alexa voice commands would time out. Then I noticed that I couldn't pull up the hub web page or dashboard. A hard reset (pull the power cable) got it working again. A day or two later, same thing happens. After about a week I noticed that it seems like I need to reset daily to keep it working. Not good.

So, the other day while I had it in this state, I fired up Wireshark to try to sniff out what's going on. (I'm a software engineer, can't help it.) I can share the pcap file if anyone wants to see it. What it shows is that my browser is able to establish a TCP connection on port 80, so it's somewhat functional through at least the session/transport layers. TCP keep-alives are even successful, keeping the connection active. But the Hub won't respond to an HTTP GET request, leaving the browser spinning and spinning. It's clearly hung at the application layer.

After resetting, I was able to pull the logs, and see a trend that the errors (SQL exceptions, Groovy-lang exceptions) that seem to start 20-24 hours after each reset.

For example, here's the most recent lockup which occurred on 8/8. Today I reset the hub on 8/12 to recover it and grab these logs. During the lockup period, no entries were recorded for any physical switch activations. (newest logs first!)
dev:192020-08-12 07:54:37.009 pm infoTelnet connection to Lutron interface established

<Edit: ........LOCKUP PERIOD......>

dev:192020-08-08 07:30:56.381 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (heartbeat)

dev:192020-08-08 07:30:56.370 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:192020-08-08 07:23:40.264 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:192020-08-08 07:23:33.068 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

app:222020-08-08 07:23:14.956 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] on line 154 (LutronHandler)

dev:192020-08-08 07:22:59.161 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:192020-08-08 07:22:59.161 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:22020-08-08 07:22:45.829 pm errorgroovy.lang.MissingMethodException: No signature of method: static org.codehaus.groovy.runtime.DefaultGroovyMethods.asType() is applicable for argument types: (java.lang.String, java.lang.Class) values: [00 00, class java.lang.Object] Possible solutions: asType(java.lang.Class), any(), grep() (parse)

dev:22020-08-08 07:22:45.819 pm errorgroovy.lang.MissingMethodException: No signature of method: static org.codehaus.groovy.runtime.DefaultGroovyMethods.asType() is applicable for argument types: (java.lang.String, java.lang.Class) values: [00 00, class java.lang.Object] Possible solutions: asType(java.lang.Class), any(), grep() (parse)

dev:192020-08-08 07:05:36.899 pm inforcvd: OUTPUT,6,1,100.00

app:42020-08-08 03:51:07.354 pm errorError making Call to refresh Token: Remote host closed connection during handshake

dev:102020-08-08 02:00:56.311 pm infoCloset is off [physical]

dev:102020-08-08 02:00:36.921 pm infoCloset is off [physical]

dev:102020-08-08 01:51:50.462 pm infoCloset was turned off [physical]

dev:102020-08-08 01:46:55.588 pm infoCloset was turned on [physical]

dev:212020-08-08 01:19:42.184 pm infoMasterBedroom was turned off

dev:192020-08-08 01:19:38.312 pm inforcvd: OUTPUT,6,1,0.00

dev:152020-08-08 01:07:47.080 pm infoEvening was turned off

dev:92020-08-08 12:01:20.083 pm infoPatio is off [physical]

Previous powerup timestamp was about 21 hours prior:
dev:192020-08-07 10:13:14.730 pm infoTelnet connection to Lutron interface established

Previous lockup was with this error:
dev:22020-08-07 05:33:38.584 pm errorgroovy.lang.MissingMethodException: No signature of method: static org.codehaus.groovy.runtime.DefaultGroovyMethods.asType() is applicable for argument types: (java.lang.String, java.lang.Class) values: [00 00, class java.lang.Object] Possible solutions: asType(java.lang.String, java.lang.Class), asType([Ljava.lang.Object;, java.lang.Class), asType(groovy.lang.Closure, java.lang.Class), asType(groovy.lang.GString, java.lang.Class), asType(java.io.File, java.lang.Class), asType(java.lang.CharSequence, java.lang.Class) (parse)

app:42020-08-07 04:07:42.159 pm errorError making Call to refresh Token: Remote host closed connection during handshake

Older crashes on 8/6
dev:192020-08-06 04:12:03.267 pm errorgroovy.lang.MissingMethodException: No signature of method: groovy.sql.Sql.executeUpdate() is applicable for argument types: (java.lang.String, java.util.ArrayList) values: [UPDATE DEVICE D SET D.STATE = ? WHERE D.ID = ?, [{"lastHeartbeat":1596744469849,"reTryCount":0}, ...]] (heartbeat)

dev:192020-08-06 04:12:03.248 pm errorgroovy.lang.MissingMethodException: No signature of method: groovy.sql.Sql.executeUpdate() is applicable for argument types: (java.lang.String, java.util.ArrayList) values: [UPDATE DEVICE D SET D.STATE = ? WHERE D.ID = ?, [{"lastHeartbeat":1596744469849,"reTryCount":0}, ...]] (heartbeat)

dev:22020-08-06 04:12:03.243 pm errorgroovy.lang.MissingMethodException: No signature of method: groovy.sql.Sql.executeUpdate() is applicable for argument types: (java.lang.String, java.util.ArrayList) values: [UPDATE DEVICE D SET D.STATE = ? WHERE D.ID = ?, [{"polled":0,"bin":1,"flashing":false}, ...]] (parse)

dev:192020-08-06 04:12:03.230 pm errorgroovy.lang.MissingMethodException: No signature of method: groovy.sql.Sql.firstRow() is applicable for argument types: (java.lang.String, java.util.ArrayList) values: [SELECT D.STATE FROM DEVICE D WHERE D.ID = ?, [19]] (heartbeat)

dev:22020-08-06 04:12:03.230 pm errorgroovy.lang.MissingMethodException: No signature of method: groovy.sql.Sql.firstRow() is applicable for argument types: (java.lang.String, java.util.ArrayList) values: [SELECT D.STATE FROM DEVICE D WHERE D.ID = ?, [2]] (parse)

Before that on 8/4
dev:22020-08-04 03:36:46.696 pm errorgroovy.lang.MissingMethodException: No signature of method: static org.codehaus.groovy.runtime.DefaultGroovyMethods.asType() is applicable for argument types: (java.lang.String, java.lang.Class) values: [00 00, class java.lang.Object] Possible solutions: asType(java.lang.String, java.lang.Class), asType([Ljava.lang.Object;, java.lang.Class), asType(groovy.lang.Closure, java.lang.Class), asType(groovy.lang.GString, java.lang.Class), asType(java.io.File, java.lang.Class), asType(java.lang.CharSequence, java.lang.Class) (parse)

dev:22020-08-04 03:36:46.697 pm errorgroovy.lang.MissingMethodException: No signature of method: static org.codehaus.groovy.runtime.DefaultGroovyMethods.asType() is applicable for argument types: (java.lang.String, java.lang.Class) values: [00 00, class java.lang.Object] Possible solutions: asType(java.lang.String, java.lang.Class), asType([Ljava.lang.Object;, java.lang.Class), asType(groovy.lang.Closure, java.lang.Class), asType(groovy.lang.GString, java.lang.Class), asType(java.io.File, java.lang.Class), asType(java.lang.CharSequence, java.lang.Class) (parse)

If I go back and search old logs for "error", here are the oldest errors on the device....
dev:22020-08-02 11:20:44.979 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-02 11:17:15.870 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-02 11:17:12.420 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-02 11:15:35.509 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-02 11:15:32.062 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-02 11:13:43.790 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-02 11:11:58.662 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-02 11:10:12.865 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-02 11:04:57.831 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:192020-08-02 11:04:54.868 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-02 11:03:25.283 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-02 11:03:25.234 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:192020-08-02 11:01:41.971 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:192020-08-02 11:01:41.977 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:22020-08-02 11:01:41.971 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (parse)

dev:192020-08-02 11:01:41.941 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:192020-08-02 11:01:41.942 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:22020-08-02 11:01:41.944 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (parse)

dev:192020-08-02 11:01:41.943 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

app:42020-08-01 01:16:05.966 pm errorError making Call to refresh Token: Remote host closed connection during handshake

app:42020-07-31 02:24:22.670 pm errorError making Call to refresh Token: Remote host closed connection during handshake

The vast majority of errors seem to be on dev:2. Dev:19 shows up a few times, and dev:21 and dev:22 even less. I also see app:4, which is the Alexa interface, but that failure may just be a symptom of the crash.

Dev:2 appears to be a Leviton Z-Wave dimmer "Kitchen", although the profile lists it as a generic Z-Wave dimmer. I just updated it to be a Leviton, and turned on debug logs. Dimmed it, turned it off, turned it back on using a virtual device (that controls multiple lights via the rule manager). Working fine, nothing interesting in the debug logs.

Ok, so where do I go from here? I expect I'll be back tomorrow evening with more data to share. But I'm really hoping this doesn't turn into a "factory reset", where I need to exclude and reconnect all of my devices to clean out the database. I'm embarassed to admit that I don't have a database backup before this issue began, as I haven't fully integrated all of the devices the way I want them. And by the time I realized I had a serious issue, I was well past the 5 days of automatic backups... Ugh.

Should I remove and re-include the offending device(s)?

Welcome to the community!

The backup is a DB dump and will not contain most of the types of corruptions that will make it impossible to restore functionality after a SOFT reset.
I would recommend downloading all your current backups, run a SOFT reset (doesn't reset the radios) and then restore a backup.
This would be a good start and may be enough.

EDIT: One more thing, if the hub locks up and you need to reboot when the main interface doesn't respond, try going to port 8081 and reboot from there primarily. Pulling the power is a very fast way of getting back to DB corruption.

3 Likes

Did you ever have a power outage, or do a "dirty" shutdown by pulling the plug? That can cause corruption.

If the restore like Markus suggested doesn't fix it, then I would try removing that one Leviton device as a test. Did this Leviton ever get a firmware update? I think there were some issues that firmware seems to have fixed?

This thread is slightly outdated, but the basic info is good. Maybe you can update yours?

1 Like

Thanks, I didn't even realize there was a diagnostic port at 8081. I'll use that tomorrow if it locks up again and restore a prior database.

If you decide to restore a previous database backup, I would follow the Hubitat Soft Reset procedure.

https://docs.hubitat.com/index.php?title=Soft_Reset

6 Likes

Yes, absolutely. Easily a half dozen times I pulled the plug. Until just now, I didn't know there was a safe way to recover the device when it was "stuck".

No, I've never attempted to update the firmware on any of my dimmers/switches. I'll look into that. Thanks!

Done. I'm cautiously optimistic at this point. After restoring, I ran the "bedtime" routine, and it was near instant. Like when the hub was new. Lately I'd been seeing odd delays and occasional Alexa timeouts on voice commands.

2 Likes

:see_no_evil:

Ouch! The hub is no different than having a desktop running windows and pulling the plug.

Some have their hub on a UPS or a backup battery system, there are some inexpensive ones recommended in a recent thread using an inexpensive circuit board and 18650 battery.

Me, I live dangerously, and don't have a backup battery. But I really should get one. I am sure it will bite me someday.

2 Likes

That's actually a little disappointing to hear. I work in embedded software, and the communication devices I work with can have the battery pulled at any time without risk of corruption. It does come at a cost.... you need a vastly more robust file system and flash memory management.

The hub is in the same room as my PC, so there's a UPS available, so that's easily remedied.

  1. Most communication devices cost a lot more than $129
  2. Most communication devices don't have a full embedded database in them

But your point is taken, and I agree it would be better if it wasn't something we needed to worry about at all, but it is what it is - so put it on a UPS. :slight_smile:

3 Likes

Just save a backup after you make a change and then who cares. I do like UPS's though so I don't have interruptions if the power blips.

Ok.... here's where we stand.

Did a soft reset last night, and restored the database.

I've connected to the hub a few times throughout the day, no issues. Just checked, and it's locked up again. The primary web interface is down, and Alexa can't reach the hub either. Diagnostic interface is working. Doing a "reboot" then I'll see what information is in the logs...

Again, seems related to device 2 in the logs, but a different error this time.

dev:22020-08-13 03:51:02.606 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-13 03:51:02.597 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-13 03:50:59.398 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-13 03:50:59.388 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-13 03:50:55.522 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-13 03:50:52.091 pm errorjava.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:22020-08-13 03:50:52.070 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-13 03:50:48.266 pm errorjava.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:22020-08-13 03:49:15.224 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (parse)

dev:22020-08-13 03:49:15.214 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (parse)

dev:192020-08-13 03:49:07.408 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:192020-08-13 03:49:07.343 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (heartbeat)

dev:22020-08-13 03:49:03.792 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (parse)

dev:22020-08-13 03:49:03.767 pm errororg.h2.jdbc.JdbcSQLException: The database has been closed [90098-197] (parse)

what device is that? I got things like (mostly from echo speaks)
dev:772020-08-13 07:56:21.059 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 1424 (recoveryEvent)
dev:772020-08-13 07:54:21.066 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 1424 (recoveryEvent)
dev:772020-08-13 07:52:21.052 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 1424 (recoveryEvent)
dev:772020-08-13 07:46:43.062 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 1424 (checkPresence)
dev:772020-08-13 07:34:21.051 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 1424 (recoveryEvent)
dev:772020-08-13 07:32:21.055 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 1424 (recoveryEvent)
dev:772020-08-13 07:30:21.095 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 1424 (recoveryEvent)
dev:772020-08-13 07:28:32.277 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 1444 (checkEventInterval)

dev:392020-08-13 07:55:20.449 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:54:20.409 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:53:20.539 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:52:21.442 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:51:20.559 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:50:20.463 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:49:20.544 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:48:20.447 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:47:20.518 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:46:20.464 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:45:20.801 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:44:20.524 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:43:20.452 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:42:20.536 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:41:20.656 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:40:20.465 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)
dev:392020-08-13 07:39:20.421 am errorcom.hubitat.app.exception.LimitExceededException: Event Queue is Full on line 635 (refreshData)

Probably wouldn’t hurt to exclude the device and see if the hub still locks up with the device completely disconnected from the hub. Although would be pretty interesting if a single device is able to completely lock up the hub.

It may not help, but seems like a useful troubleshooting step to see if the fact the errors are attributed to that device are just coincidental.

1 Like

It's a Leviton ZWave dimmer

@bcopeland, any thoughts about @jeffbrig 's errors? Not sure if it is a device issue, or something else? Lots of weird database errors though.

That looks kinda like a looping rule. Any chance you could post a screenshot of your 7 rm4 rules?

Jeff,

Could you please try v2.2.3? It should be available under Settings - Check for updates - Update hub.
The error you're getting pretty much fills the entire downloadable log within an hour, so I can't see how it starts. But 2.2.3 has some code changes that might help, and it can always be rolled back by going to http://your.hubs.ip.address:8081/ and picking Restore previous version. The entire update should take under 10 minutes on a C5 hub.

All hubs run embedded database in the same process as apps and hub UI, so having an error saying the database is closed indicates something's gone wildly wrong. If update doesn't help, perhaps we catch the problem when it just started, when logs are likely to offer more clues.

1 Like

This is one of my drivers for the part with recoveryEvent (device 39 is something else, probably same platform problem though, downgrading to 2.2.2 should fix it for drivers that don't have a fix yet), it is due to a platform change/bug in 2.2.3. I just released a version of my drivers which will disable this feature in my drivers when on 2.2.3. It will unfortunately mean that devices are much less likely to come back after dropping :frowning: