Zigbee channel changed itself ! Didn't even know this was possible-Could this be the missing link to bad zigbee performance?

Caution-some ranting ahead....
I'm on the latest 2.2.3.148 and it's been rough. Zigbee is constantly going offline. A soft reset last week produced a 500 error and had to soft reset 3 times before it took. Initially started as an SQL error, Now the freaking zigbee channel changed itself after the zigbee network going offline repeatedly over the last week. Ticket is in, but wondering if this is a bug or a feature? Should the zigbee channel change itself?
And how many soft resets is the magic number? I'm going on number 4 shortly.

Does anyone else feel like they're part of a bad beta?(Actually the whole HA scene is a real beta, not just HE)

In the latest, now the hub will not allow me to change the zigbee channel and let it stick, it keeps going back to 11, and the stupid HE is picking the channel used by my lightify gateway-11.

WAIT WAIT. I shutdown the hub, pulled the plug and the hub came back up on my preferred and previously set channel 25 without me changing anything. Thankfully the PAN ID stuck
Paging @markus, who I call the Zigbee Jedi

This is what started it all:
dev:19282020-09-19 07:37:13.528 pm infoAlarm 4 Buttons 🔘 has departed

dev:19282020-09-19 07:36:12.758 pm errorjava.lang.Exception: Unable to aquire lock to schedule run once for isNotPresent (parse)

dev:19282020-09-19 07:36:12.571 pm errorjava.lang.Exception: Unable to aquire lock to schedule run once for isNotPresent (parse)

dev:19282020-09-19 07:36:11.366 pm errorjava.lang.Exception: Unable to aquire lock to schedule run once for isNotPresent (parse)

dev:19282020-09-19 07:36:11.172 pm errorjava.lang.Exception: Unable to aquire lock to schedule run once for isNotPresent (parse)

dev:19282020-09-19 06:57:24.618 pm errorjava.lang.Exception: Unable to aquire lock to schedule run once for isNotPresent (parse)

dev:19282020-09-19 06:56:47.718 pm errorjava.lang.Exception: Unable to aquire lock to schedule run once for isNotPresent (parse)

dev:19282020-09-19 06:56:47.547 pm errorjava.lang.Exception: Unable to aquire lock to schedule run once for isNotPresent (parse)
Later on this happened:
ev:25062020-09-27 01:45:51.867 pm errorjava.lang.RuntimeException: java.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:25062020-09-27 01:45:20.317 pm errorjava.lang.RuntimeException: java.sql.SQLException: The database has been closed [90098-197] Query: UPDATE DEVICE D SET D.STATE = ? WHERE D.ID = ? Parameters: [{"energyTime":1566346499475}, 2506] (parse)

dev:25062020-09-27 01:45:08.741 pm errorjava.lang.RuntimeException: java.sql.SQLException: General error: "java.lang.IllegalStateException: Reading from nio:/home/hub/hubitatdb.mv.db failed; file length -1 read length 768 at 5006593 [1.4.197/1]"; SQL statement: UPDATE DEVICE D SET D.STATE = ? WHERE D.ID = ? [50000-197] Query: UPDATE DEVICE D SET D.STATE = ? WHERE D.ID = ? Parameters: [{"energyTime":1566346499475}, 2506] (parse)

dev:25062020-09-27 01:45:08.771 pm errorjava.lang.RuntimeException: java.sql.SQLException: General error: "java.lang.IllegalStateException: Reading from nio:/home/hub/hubitatdb.mv.db failed; file length -1 read length 1024 at 16737554 [1.4.197/1]" [50000-197] Query: UPDATE DEVICE D SET D.STATE = ? WHERE D.ID = ? Parameters: [{"energyTime":1566346499475}, 2506] (parse)

dev:25062020-09-27 01:15:33.590 pm infoPlug Bsmnt Pwr Mprice energy is 58.570 kWh

dev:25062020-09-27 01:15:33.583 pm infoPlug Bsmnt Pwr Mprice energyDuration is 403.71 Days

And below the same 4 button Iris FOB I had prior issue with

dev:19282020-09-27 02:22:38.793 pm errororg.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'myDS': java.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:19282020-09-27 02:21:33.622 pm errororg.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'myDS': java.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:19282020-09-27 02:19:25.962 pm errororg.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'myDS': java.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:19282020-09-27 02:18:16.974 pm errororg.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'myDS': java.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

dev:19282020-09-27 02:17:13.021 pm errororg.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'myDS': java.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:19282020-09-27 02:16:10.205 pm errororg.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'myDS': java.sql.SQLException: Connections could not be acquired from the underlying database! (parse)

dev:19282020-09-27 02:16:10.175 pm errororg.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'myDS': java.sql.SQLException: An SQLException was provoked by the following failure: com.mchange.v2.resourcepool.ResourcePoolException: A ResourcePool cannot acquire a new resource -- the factory or source appears to be down. (parse)

Support is ghosting me.....

1 Like

Think there’s a chance the hardware is failing? Seems pretty strange that it took 3 soft resets for it to stick.

Even those errors seem to be pointing to something very wrong with the dB, at least to my eyes. Those errors that include a file path to the dB seem particularly bad, maybe the underlying storage is failing?

(Sorry I’m not actually helpful other than speculation)

I do not know the history of your hub so just throwing out some suggestions, have you tried another usb power supply and cable. Do you have your HE hub connected to a UPS.

I know this isn't helpful to your situation but for another data point, I have never experienced the SQL corruption. I have my hub connected to a mini ups to prevent power outages causing corruption.

Not sure but isn't it a Zigbee feature that the it can change frequency if encountering bad network performance to improve reliability? All the devices will follow a frequency change by the hub (over time). I am not sure if setting the frequency is supposed to stop that.

Thanks guys. Seems reset #4 helped. The hub is on battery backup and I never pull the plug without shutting down. Seemed the DB errors started the whole thing, then zigbee went wacky. And I still can't believe I tried to change the channel 4 times, and it wouldn't stick. And still don't know what changed it in the 1st place. The last time I tried to change it, it still said channel 11, I shut down, pulled power for 30 seconds, then booted back up and it was back to 25.
I was so baffled I took a screenshot to convince myself

Looking at those errors I'm wondering, did you ever set your hub to use the external DB? That was never an official feature and is supposed to be removed completely at some point, if not already done.

HE is not supposed to change Zigbee channel and have not done so for me even when I tested under conflict circumstances where other controllers do. It is from my understanding an optional "feature" which could be implemented/activated in a Zigbee controller or not, it is up to the implementation.

Which driver is that?

1 Like

That’s unusual in my experience.

Do you mean you opened a ticket for this isue, support acknowledged, but then stopped responding (i.e. the common definition of ghosting)?

2 Likes

It' a stock driver Iris 4 button FOB driver. And I have no idea how to use an external DB, and didn't even know it was possible. How would I change it?

Yes, correct. When issue started I got an initial response, then nothing. This is a continuation of the original issue. My hub was -mostly- fine for months, I hope the device isn't going south.

EDIT- just got another error 500 when going to load apps page, this is not looking good...

If you did not specifically enable using a remote/external database, then the hub is still using the embedded database. This is fine because @markus indicates that switching to a remote database was beta-tested but never released.

1 Like

This really doesn't look right, after a soft reset and restore it would "normally" work. At this point you'd either have to find what it is that is thrashing your DB by removing all non-native drivers and apps (I know, crazy if you have a lot of that) or hope that support has a way to diagnose it. @bobbyD this is a recurring DB corruption even after soft reset, do you have any further insight?

That you couldn't change channel was probably a DB issue, not a Zigbee one.

As @aaiyar is saying, if you don't know what the external db is, then you don't have it active and don't need to worry about it since it is supposed to be removed and was never officially announced.

1 Like

Back when I had to reboot my hub daily, if I created a manual backup, sometimes the Zigbee radio would crash and say the channel was 11. I don’t think it really was, probably just a default when there isn’t any response from the radio. Usually if I disabled the radio and reenabled it, I would get control back. This problem went away when I bought a second hub and split the zigbee devices (also put the peanut plugs in a drawer as they were constantly changing routes).

1 Like

Around a year ago I started a thread about my C3 hub.
The zigbee radio would go offline and the channel would show as being 11.
This would happen 1 in 5 times when doing a backup.
I put this down to heavy load when doing a backup.
Disabling and renabling the zigbee radio would bring it back online ok.
I never got an answer to the thread but in fairness I did not raise a ticket.
Could your hub be experiencing heavy load when this happens?
Just speculating.

BTW this issue has stopped now, the hub is in my draw as a spare.

1 Like

Not sure if this helps, but I was writing some code a few weeks ago and seeing the 500 error. I had mistakenly created an endless loop in my code. When I would pull the plug and restart I thankfully had enough time to initiate a restore from a backup taken before I made the code change and performance then went back to normal.

At the very least it may be worth downloading the backups that are on the hub if you don't do that already, just in case it happens again in the next few days. I keep meaning to set that up myself...

1 Like

Not really, but will investigate. You nailed it:

Soft Reset gets your hub back, but if measures are not taken to remedy what caused the database to get corrupted in the first place, then it's only a matter of time before the database locks up again.

Another Soft Rest, would resolve the problem, then figuring out what's causing the problem is imperative to avoid the hub locking up again. Screening Logs after reset should reveal what the problem is.

2 Likes

Maybe your initial ticket was closed if the issue resolved for months.

Could consider opening a new ticket and referencing the old one, if it’s related.

1 Like

Thank You, it wasn't months, but about 2 days. I guess I can open another ticket
And the main ticket issue- DB corruption, seems to have started the downward spiral, as I was mostly okay for months before that. I also noticed after my 4th soft reset, before restoring a backup, hub navigation was very fast from page to page, but as soon as I restored the backup I noticed a much slower response.

I guess this reinforces @markus' suggestion (also made by @bobbyD); since the database restore brings back any custom apps and drivers, the onerous task of determining what is hammering your hub lies ahead.

One of my hubitats experienced chronic slowdowns (due to the database being hammered) several months ago. In the end, the issue (discovered by @bobbyD) turned out to be stranded/ghost z-wave devices ..... sadly, each situation is unique.

4 Likes

Yeah it really stinks. I regularly check for ghost z-wave devices. Prior with previous firmware the hub took care of zigbee devices as adding them as unknown, but alas that great feature is gone.
Now this is becoming a daily occurence, zigbee offline

I know it doesn't resolve your issue but does disabling/enabling bring it back online?
Might be a quicker way of getting things working.