Trying to Debug Hub Lockups

Ryan780 · February 2, 2019, 5:27pm

That was one of the reasons I didn't implement the InfluxDB. (it's also REALLY f*ing confusing so that was a big part too. ) This is also why I asked the question the other day about having logging in place for so many rules and whatnot. I know the act logging a thing can sometimes be more labor intensive than the thing itself. I've seen that in some of the systems I work on for my day-job. Sometimes the tools you use to track down a small problem end up causing larger problems rather than helping you fix the small one.

I've removed Device Monitor and we'll see if that has any impact. Only problem, I've got to wait a good 3 weeks before I'll know for sure.

gavincampbell · February 2, 2019, 5:51pm

Just looking at the code to that, it subscribes to all the events of all your selected devices and calls eventCheck every time one of them is triggered (look at the subscribeDevices method). That could potentially add up to a lot of processing. For example if you had a group of 5 devices that turn on at once, its going to process each on command while your hub is also trying to do other things. In my "good night routine" I could see this being a problem as it goes through and turns off a lot of stuff.

Just throwing that out there.

Ryan780 · February 2, 2019, 5:58pm

I had that function (logging every time a device triggers) turned off. That would be WAAAYY too much going on. It was only logging based on time and I had it set for 30 minutes. I wasn't even logging battery level.

ogiewon · February 2, 2019, 5:58pm

I just removed "Device Monitor" from my hub, in the hope that it might be causing issues. My Prod hub does not have "Alexa TTS" installed on it, as I thought it was an obvious suspect weeks ago. I too have been using chromecast for now.

My custom Apps are:

ABC-Advanced Button Controller
InfluxDB Logger
SmartTiles

My Hubitat Apps are:

Amazon Echo Skill
Button Controllers
Chromecast Integration Beta
Google Home
Groups and Scenes
Hubitat Dashboard
Hubitat Safety Monitor (water leak detection only)
Hubitat Simple Lighting
IFTTT Integration
Life360 Connector
Lutron Integrator
Mode Manager
Motion Lighting Apps
Nest Integration
Rule Machine

If we can find the overlapping apps, we mind be able to pinpoint the problem one(s).

Ryan780 · February 2, 2019, 6:02pm

I overlap on all of your Hubitat Apps almost. Mine are:
Amazon Echo Skill
Button Controllers
Chromecast Beta
Google Home
Groups and Scenes
Dashboard
HSM - Security + Batter + Smoke
Hue
IFTTT
LCM
Lutron Integrator
Maker API
Rule Machine (147 rules)
SharpTools
Z wave Poller (for 1 z-wave device)

See, my first thought was Ecobee. But I see you use Nest. Plus, I have logs for all the issues with the ecobee servers being kaput all week. And when the hub locks up, there's no logging going on.

homeauto2112 · February 2, 2019, 6:15pm

This may be of help to the logging question farther up in the thread. Obviously if the hub locks up, the logging will stop, but it might give a place to start. You'll need a RPI or another machine to run the nodejs on.

stephack · February 2, 2019, 6:17pm

SmartTiles was one of my suspected apps a while back...again because of the excess logging and event subs. The only reason I haven't removed SharpTools yet is because it's such a PITA to setup.

BorrisTheCat · February 2, 2019, 6:19pm

Me too i have the exact same issue but i don't have WC installed any more (it happens with or without it)

stephack · February 2, 2019, 6:26pm

Custom:
ABC
Boot Me Up Scottie
EspIR Manager
InfluxDb
Konnected
My Custom Occupancy Lighting
My Custom Sonos Preset Control

BuiltIn:
Amazon Echo Skill
Chromecast Integration
Google Home
Groups and Scenes
HSM - battery check, water leak and intrusion
Hubitat Simple Lighting
Hue Bridge
Lock Code Manager
Lutron Integrator
Maker Api
Mode Manger
Motion Lighting
Rachio Integartion
RM
SharpTools
Sonos Integration
Zone Motion Controllers

BorrisTheCat · February 2, 2019, 6:29pm

this is what i believe to as it seems to coincide with my issues.

Ryan780 · February 2, 2019, 6:41pm

Mine did happen in the morning but I wasn't even home at the time of the last 2 so no "good morning" routine was running or anything.

BorrisTheCat · February 2, 2019, 7:03pm

yeah i had the same it happened on christmas day when i hadn't been there for a few days and was 2 hours away.

system apps:

Button controllers
chromecast integration
Google home
Groups and scenes
Hubitat Dashboard
HSM
IFTTT
Life360
Mode manager
Motion lighting apps
Nest integration
Rule Machine (41)
SONOS Integration

User apps

Cobra's App (check open contacts, message central, modes plus and presence central)
Welcome Home

Custom Drivers (currently disabled for checking)

Fibaro UBS (no other driver but i have a lot)
Aeotec water sensor 6 (currently using built in driver)
Dual Relay Driver (nothing else about)
Virtual Container (cant see this being a issue so not disabled)
Fibaro dimmer 2 (currently using built in one but it doesn't do scenes so my lights don't work )

srwhite · February 2, 2019, 8:10pm

When I was having this issue I had both my Aeon HEM paired securely and AlexaTTS installed. I reset the hub, the HEM is now on a second hub, paired insecurely, and I ordered a dedicated hub for AlexaTTS and some other cloud integrations.

I’ve not had a return of the crashes. I believe it was due to having the HEM paired securely. When the hub arrives next week I’ll give AlexaTTS a shot.

There is one huge difference to consider between SmartThings and Hubitat when it comes to apps. I do not believe that Hubitat puts any kind of restrictions on app execution time or resources. SmartThings has several throttles, namely the 20 second rule that kills app and driver execution. It is absolutely possible for CoRE (or any ported app) to have a bug that would never occur because of these “guard rails”.

I’m not pointing any blame to those apps, just throwing out something to consider.

kamransiddiqi1998 · February 3, 2019, 9:10am

The only user code I have is homebridge and I experience two lockup’s requiring a physical reboot

JulesT · February 3, 2019, 12:06pm

OK. And I've just had another hub lockup as well.

Looking across the logs - last entry was at 10:54, and everything past that was post reboot.

I'm not running WebCore, or similar, but AM running a bunch of other stuff. I suspect what we REALLY need is to be able to export the system logs via SNMP or similar - given that we fail HARD... my guess is a memory leak or similar is causing the oom-killer to randomly start eating processes, but we'll never get there from in-application logging.

HE guys - I'm happy to run SNMP receivers (already AM for other stuff, to be honest)... but I can't help but echo the points above - it's entirely possible that I'm the architect of my own downfall on some of the drivers that I've written... but there's no instrumentation that would help be deduce if that's so.

-- Jules

PS. Logs from the system below:

dev:4822019-02-03 12:01:34.950 pm debuginitialize...

dev:2832019-02-03 12:01:34.680 pm debugDevice Initialized: (Nest Eventstream)...

app:862019-02-03 10:54:45.171 am warnpoll- force:false, type:null, isPollAllowed:true

app:1622019-02-03 10:54:05.027 am debugSending DEVICE Event (Living Room Speaker | UDN: uuid:856f546c-795b-1848-0080-0005cd5113a0) to Homebridge at (192.168.2.103:8005)

sys:12019-02-03 10:53:05.540 am warnReceived data from 192.168.2.146, no matching device found for 192.168.2.146, C0A80292:E4A1, 0005CD3D02BC or C0A80292.

sys:12019-02-03 10:53:05.354 am warnReceived data from 192.168.2.146, no matching device found for 192.168.2.146, C0A80292:E4A0, 0005CD3D02BC or C0A80292.

app:1622019-02-03 10:52:00.575 am debugSending DEVICE Event (Hallway Speaker | UDN: uuid:52b2dfaa-91b3-1f90-0080-0005cd71ca34) to Homebridge at (192.168.2.103:8005)

app:3912019-02-03 10:51:56.829 am infopostToInfluxDB(): Posting data to InfluxDB: Host: 192.168.2.247, Port: 8086, Database: Hubitat, Data: [water,deviceId=386,deviceName=Boiler\ Leak\ Sensor,groupId=null,groupName=Home,hubId=1,hubName=C4:4E:AC:1D:92:E4,locationId=1,locationName=Office\ Hubitat,unit=water value="dry",valueBinary=0i]

Interestingly... the last log entry was from Nest Integration before it fell over.

mpoole32 · February 3, 2019, 3:01pm

The last time I experienced a hub lockup was back when an update included the ability pair non-secure zwave devices. And then this morning I had this complete lockup. Not sure if these Dashboard errors are a result of the lockup or if they someway caused the lockup. These are the only errors recorded in past logs.

Edit
Just noticed the automated backup happened around this time. Normally that backup happens + or - 3:00am.

Ryan780 · February 8, 2019, 11:40pm

So, @ogiewon, I think we might have found our culprit. Iwas gone for3 days this week with no activity at home and no lockup on my hub. I think Device Monitor was the offender. How about you? Anything since you removed it?

cuboy29 · February 8, 2019, 11:42pm

I woke up this morning to an extremely sluggish hub. Painfully slow to navigate the menus but I was able to go in and disable influxbd then the hub became responsive again.

I was running the lastest version of influxbd too. Would really love to get to the bottom of this.

Ryan780 · February 8, 2019, 11:45pm

One of the reasons I've been avoiding installing that one. It's gotta be a huge resource hog.

ogiewon · February 8, 2019, 11:51pm

I removed Simple Device Monitor a while back, but yesterday my hub stopped running automations. I could still log into it, though. I contacted Support and they say my hub ran out of memory. So I must have an app that is killing it. Right now, InfluxDB is still my prime suspect. I use InfluxDB on my Development Hub as well, and it has never locked up. The mystery continues.