C5 Hub hangs one day after upgrading off of 2.2.4.158

rwriddle · September 1, 2021, 9:21pm

I have a C5 hub that's been running about 2 years pretty much without issues; running a bit more than a dozen hardware devices all Z-Wave. The C5 is on an 750VA UPS so it doesn't see any power failures. It updated several times without issues from its original v2.1.4.130 all the way to its current v2.2.4.158.

But when I bought a C7 running 2.2.5.131, I tried to upgrade to the C5 also to that 2.2.5.131. But it hung repeatedly happening generally after about one day of normal operation from being reset/restored. When it did, I back it down to that 2.2.4.158 with a soft reset/restore and it goes back to being reliable. Each time there was nothing in the logs that seemed to give any hint to me as to why it was hanging.

So I've been content just running that C5 on 2.2.4.158; waiting until the Hub Protect / MIgration was ready to try again.

After reading good things about migration this last week I decided to try that again. I updated the C7 to 2.2.8.156 as required for hub protect migration. And I updated the C5 also directly to 2.2.8.156 and saved a new backup. Two days later I noticed that the C5 was again hung. I did a soft-reset and restore to the C5 to its 2.2.8.158 backup. The next day it was hung again. So I've again reverted it to 2.2.4.158 and would appreciate any suggestions on how to solve this issue.

I'm very hopeful of avoiding a full-manual migration; dropping all code and devices from the C5 and starting from scratch on the C7.

(Side question: I've got a lot of time in Rule Machine and At Home Simulator child app setup. if I do have to go full manual migration, is there any way to export and move the apps as configured rather than manually writing every setting down and manually re-producing them on the C7).

gopher.ny · September 2, 2021, 11:32am

When it's hung, is diagnostics tool on port 8081 reachable or is it hung completely and not responding to anything? Also, does the light remain green?

rwriddle · September 2, 2021, 4:10pm

No, when hung it doesn't reply in any way to attempts to access via port 8081 and also doesn't respond to a ping. (I have it on a static IP based on MAC so its address never changes).

Yes, the light remains green when hung.

rwriddle · September 2, 2021, 4:26pm

I get a little frustrated because if it was Java with a standard JVM, I could just take a JVM heap dump to at least see what the hub was doing at the time of the hang to get some idea how to proceed. But I was told there is and will be no access to that level of debugging on this hub. And there doesn't seem to be any kind of virtual environment (like SmartThings' Simulator or a Windows- or Linux-hosted VM) where post-crash conditions can be studied.

If Hubitat is concerned that providing general access to such memory dumps might be misused for trade secrets access or malware access, perhaps they could consider a feature to produce an dump or other state information as AES-encrypted output that could be submitted to them for analysis. That way they need not fear anyone other than them could misuse such access.

I love this hub but I think there would be many more awesome and stable new apps for it if there were more provisions to aid development and problem resolution.

gopher.ny · September 2, 2021, 5:35pm

It's very unlikely to be a JVM issue. Port 8081 would still be accessible, its service is running in a separate process. Still looking for clues...

rwriddle · September 3, 2021, 1:59am

I agree. Note I am not saying that the hang is a JVM issue; just that it would be nice if we could get a JVM heap dump because we could then get info like the thread stacks and lock lists. That might allow us to see things like deadlocks, threads waiting on an "OS" (JVM-level) service request to complete, timer to expire, "dead" threads, remaining heap memory in case this is an OOM error, etc. There ought to be some kind of artifact to provide at least a starting point to find this kind of hang.

Does anyone know if there were any breaking changes between 2.2.4 and 2.2.5 that my app configurations might somehow have violated?

djw1191 · September 3, 2021, 2:14am

Just to be that guy, I really doubt a heap dump is going to be at all meaningful to us, the consumers. Without knowledge of underlying code and systems it going to be pretty useless.

It’s also not really a reasonable request to make or a consumer product. We’re not talking home assistant or some other open source product, where that sort of low level information would be more useful.

rwriddle · September 3, 2021, 4:38pm

re: "I really doubt a heap dump is going to be at all meaningful to us, the consumers"

I agree this type of capability probably would not be meaningful to what you're terming "the consumers" (by which I take it you mean those who will never try to actually write or modify code for a Hubitat app). But it definitely would be meaningful to most developers serious about developing applications for this hub. And allowing developers to create and maintain stable apps for Hubitat more easily and more quickly would itself benefit those consumers.

There are few major platforms around for which this kind of information is not available to developers. I know this is a very different and lower margin market sector but the information is likely there under the covers.

I would like to write for this hub and need to resolve this to free up my C-5 for that work. I'm just saying an improvement in available debugging instrumentation / information would be greatly appreciated.

Having a completely stable set of apps on one firmware that consistently hangs on a newer version is uncommon but not that unusual. Having no way to even attempt to identify the problem is.

csteele · September 3, 2021, 4:59pm

As a Developer that collaborated on one of the larger Apps found here on Hubitat... HubConnect, with its many Apps and drivers, I've never once wished I could see a JVM heap dump. If I never saw a heap dump again in my life, I would not miss it.

I would posit that HubConnect was and remains a very stable set of apps and drivers, without the need for even a single heap dump. I feel strange to say it. Because I'd be in any line formed for "more info" but from a practical standpoint... I've got proof it's not necessary.

rwriddle · September 3, 2021, 8:53pm

I bow to your experience on this platform, then. ... But I'd still like to know why my C5 is hanging! My Java experience is mostly corporate: running under WebSphere, WebLogic, Tomcat, etc.) so I tend to think of debugging issues under those environments.

I am trying to understand this environment to decide how to work through my problem.

Nearly all of my Hubitat usage is either Simple Automation or Rule Machine initiated At Home Simulator. I have no idea what the underlying architecture is on Hubitat or even how a user "app" like AHS actually executes in terms of code flow and exception handling. Could I simply editing the AHS parent and child app code on that hub and add defaultUncaughtExceptionHandler to each with some debug error logging to see if that sheds some light on my hangs? Or would that disturb the normal exception handler flow?