How to find cause of hard crashes?

I've got a C5 that was really stable until a firmware update sometime late last year or early this year. It will lock up pretty consistently every 3-4 weeks or so.

Symptoms of lock up/hard crash:

  • Green LED stays on
  • Hubitat stops responding to pings
  • Hubitat is unreachable on any port, including 8081
  • All automatons and rules stop working
  • I have to unplug the USB power source, wait a few seconds, then plug it in again to bring the Hubitat back up

What I've done to try and solve the problem, done one at a time after each crash to try and isolate the cause:

  • Installed a rebooter app, configured to restart the Hubitat process
  • Configured the rebooter app to completely reboot the device
  • Configured a Zooz double-plug so it's not as chatty as I saw a topic about it causing problems
  • Removed the Zooz double-plug altogether
  • Configured Peanut plugs to be less chatty
  • Removed all Peanut plugs, as some people say they've caused problems

I have network monitoring software that pings the Hubitat and sends emails when it stops responding to pings, so I have exact timestamps of the crashes. Using the timestamps, I've checked the Hubitat's 'past logs' and have found nothing immediately before or around the time of the crash that looks suspicious or out of the ordinary. Many times, there isn't anything in the logs for 5-20 minutes preceding a crash. I don't see any patterns in the logs before a crash either. The Hubitat has been sitting in the same place it always has with plenty of air flow and no heat sources near by. I keep the firmware current as well.

I've tried emailing Hubitat support about this twice now, waited weeks/months for a response and have yet to hear back from them.

Are there more detailed logs somewhere on the Hubitat? This has been driving me nuts, especially as I have safety related devices (water sensors, smoke/carbon monoxide sensors) that I really need to be able to rely on.

Any help would be much appreciated.

One thing that has happened to a few of us is that the cable, ethernet port on switch, or the entire switch is the cause. Try a different cable and port. If you have some other switch to plug it in to, you can try that as well. Bad power might also be the issue. Try a different power source and cable. Make sure the power adapter puts out enough amps.

Yeah - I've been having random reboots (not lockups) for a while and I can't track it down either. I have a couple of Zooz plugs (also on a C-5) and I too have minimized the "chatter" from those. Nothing in the logs at all. All my automations are in Node-RED, so all the automations in HE have been paused and disabled. I have a weekly reboot but from the logging that I have going on, HE hub vitals (memory, temp etc.) are normal. I do have some community apps/drivers but they were all there before this started happening.

I have changed out the power and the ethernet cable and even the power strip. It's connected into a UPS as well. From a "switch" perspective, it is connected to an Amplifi HD in RAMP mode and I have changed out the port that it is connected to.

For now, I'm just "living" with it - making weekly backups (local and cloud) and keeping my fingers crossed!:crossed_fingers:

Hmm, I've had the same cable/port/switch the entire time I've owned the Hubitat but I guess it's possible one of those is causing the problem. I'll move the Hubiitat to a different switch with a different cable and see what happens.

I've used the same power source the entire time as well - 5V/1A plugged into a UPS. I do have a spare 5.3V/2A power source laying around. If that's not too much power for the Hubitat, I can try that if it still locks up after the networking change.

I appreciate the troubleshooting ideas from you and hope one of them will solve the problem. Unfortunately, I'm still quite frustrated at the lack of logging and being completely ghosted by Hubitat's support though.

It's possible your database could be corrupt. Create a new backup and download that backup to your PC. Do a softreset then apply the one you saved to your PC. This will clean the database and at least eliminate that possibility.

2 Likes

I've tried the following suggestions from this thread (one at a time to try and isolate the cause) and am still getting the same hard crashes:

  • Moved the Hubitat to a different switch with a different cable
  • Performed a soft reset and restored a saved backup

The timing of these crashes is remains fairly consistent, usually every 3-4 weeks still, despite trying these suggestions.

The only remaining suggestion for me to try is a different USB power source and cable, which I've done now.

Unless anyone has any other suggestions for me to try, I may need to move to a different platform for home automation. I don't really want to do this as Hubitat has been in the sweet spot of ease-of-use and power, but the complete lack of response from Hubitat support has been quite discouraging.

The only thing I can think of to try from here is the scorched earth approach - hard reset and adding all devices back again. This would be quite unpleasant to do as I have a fair amount of devices and I don't have any reason to think that this would help any :frowning:

Have you tracked the free memory available and datbase size when it crashes?

I have not. I've just now checked http://hubitat-local-ip/hub/advanced/freeOSMemoryHistory and this only has history going back to the time of the most recent boot so info from the time of the crash is not present. It looks like i need the hub information driver to see the database size? I'll get that installed.

I turned off the nightly reboots after the soft reset but these crashes were happening even with nightly reboots before - do you still think it's an issue of memory available?

I track 3 values using an InfluxDB - CPU Load, Free Memory, and Temperature - any of these could cause a crash. I should probably also track DB size, but it’s normally pretty stable so I just have an alert set on it.

Edit: If you only want to do a point in time for the database: http://<yourHubIP>/hub/advanced/databaseSize

@bobbyd or @gopher.ny , any suggestions?

Do you mind if I put you on beta program? 2.2.9 has a number of changes, including a low level watchdog. If there's any OS level activity still happening, it should recognize the frozen state and reboot the hub. Also, I'd like to see whether other core changes make a difference and prevent them from occurring in the first place.

4 Likes

@gopher.ny I'd love to try out the 2.2.9 beta. I'm unsure how much OS level activity is present if it's not responding to pings and unreachable on any port but I've certainly got nothing to lose in trying it. Please let me know the details! Thanks.

Thanks for the db size endpoint. I'll get some basic logging for the free memory and db size going. I don't know how much the db size can change over time and after reboots, but at the moment mine is 10mb.

I haven't seen this issue for quite some time even with the newer firmware version. It's better but still crashes. I have a reboot schedule and sometimes it will not make it a full week before it falls over. An example is today I found my C5 completely unresponsive. I tried all different Avenues to try to get into it without unplugging it and plugging it back in. However, I had to resort to unplugging the device and plugging it back in again. This is a very frustrating situation and my biggest complaint about the product.

@jsarcone

Start with the basics. With all that plugging and unplugging you may have a corrupted database.

go to settings>>backup and restore. Click the download button and save the backup to your PC (leave the onboard backups alone)

Go to yourhubip:8081 and do a soft reset. Upon reboot restore the file you saved to your pc. Then post a screen shot of your z-wave details page. I haven't rebooted my C5 and C7 in a long time and only for platform upgrades.

Thank you for the suggestions. I have done that quite often. Still, this hasn't resolved my issue. DB used to be an issue but with some of the newer Firmware it hasn't been. Now, it is memory consumption over time. Until it falls over.
Now this does happen on C5 and my C7. All automation, Apps and about 90% of the devices is on the C5 and on the C7, 10 devices almost no apps. C7 stays up longer for only a few days longer than the C5.
Others have reported memory consumption issues (memory leaks) and there have been changes to firmware to attempt to address them.
So, I'm rebooting c5 twice a week as of today.

Are you running maker api?

Only on the C5 and Yes I am.

Hub connect too on the c5 only and Hub Mesh c5 and c7.

I ask only because there has been a lot of issues like that with maker api involved...

Like I said though, post both your z-wave settings pages and lets take a look from there.