Slow Hub - better tools to diagnose slow running devices/apps

jpoeppelman1 · December 30, 2019, 12:57pm

I'm following up on the thread below. I realize Hubitat's stance is to disable device and apps one at a time until the culprit is found, however this process does not scale. I have hundreds of devices and apps running so it's like finding a needle in a haystack. Why is Hubitat unable to create a tool that pinpoints apps or devices that take up too much CPU / memory? Just think how many support tickets would be prevented, what am I missing here? Why can't this be done?

rob121 · December 30, 2019, 1:42pm

I recently put in a ticket about a crashing hub, and I went through the process of finding the culprit (it was a pain). I made the exact same complaint about having better tooling.

Here is a excerpt from the ticket response:

Advanced features geared toward power users and developers are on our radar, along with better debugging tools.

So they are hearing us, in terms of timelines they don't commit to timelines publicly so we'll have to wait. My experience has been that while I might not like the timeline they do come through

jpoeppelman1 · December 30, 2019, 3:51pm

Thanks, it would be great if we could correlate CPU and memory consumption to individual devices and apps.

bobbles · December 30, 2019, 4:01pm

Wouldn't that slow the hub down.

rob121 · December 30, 2019, 4:06pm

So my understanding (and please anyone jump in if I'm wrong) is that hubitat is a single java process that runs everything, so high cpu etc is only going to manifest as showing the single process. BUT regardless better logging would be way more helpful. For example I had an issue getting sql resource exhuastion, there was very little in the logs. Likewise I assume it's running threaded and getting better logs on what a thread is doing etc would be more helpful.

At any rate I found my problematic code (magichome driver) and since removing that my hub has been stable, but would love to deep dive and figure out why it was causing an issue (I assume with the sql errors it was very chatty with the queries in some way)

jpoeppelman1 · December 30, 2019, 4:16pm

Should Hubitat be re-engineered to be multi threaded so that we can more easily understand the offending process?

rob121 · December 30, 2019, 4:17pm

Nothing so drastic. You would be amazed at the software that runs as a single process, it just gets debugged in other ways. And it probably is multi threaded just a single process spawning them, meaning it's still a single consumer of cpu/mem when looking at the OS

phiz118 · January 3, 2020, 4:08pm

I completely agree. This is the #1 reason I'm thinking about switching to another piece of software. I have hundreds of devices and 20 rules. I ended up removing all of my custom integrations to diagnose a slowness issue and well, it's still there. I'm now looking at removing device by device. I'm not willing to do it. If I'm going to go through that much work, I might as well just rewrite my integrations into home assistant.

rob121 · January 3, 2020, 5:04pm

The best method I’ve seen for handling this is expanding hubs since they can be chained together, it’s actually a pretty good method since it’s modular and you can move trouble devices to their own hubs. Some of the most serious users here have 2-3 hubs.

jpoeppelman1 · January 3, 2020, 6:30pm

Is it a scam to promote buying more hubs?

JasonJoel · January 3, 2020, 6:35pm

Nah. The vast/overwhelming majority of users only use 1 hub.

I definitely wish the hub had more resources/was more resilient though. My Home Assistant install never dies unless I do something catastrophic to it - which does happen on occasion due to its complexity....

But I don't think that's because their software is better, j think it is because the machine it runs in has way more cpu cycles, iops, and memory than the HE hub. So it can absorb some goofs more easily.

That doesn't mean I'm switching to HA wholesale, though. It is a lot more work to implement logic, maintain, and device support is spotty in some cases.

NeighborGeek · January 3, 2020, 7:53pm

In more ways than one. I'd love to just have access to historical log data from the logs that we do see. It would be very helpful when debugging a misbehaving rule or device if I could export the logs for a half hour or an hour window around the time when my family tells me that it was acting up, and use other text manipulation tools to review/filter the logs.

jpoeppelman1 · January 4, 2020, 1:27am

Idea:

Hubitat to make the platform open source with Linux or Windows installer as the brain with USB radio stick.

WiFi or CAT5 based Zwave/Zigbee extenders could be installed as remote relays where z-device mesh fails due to interference.

Can you imagine running the hubitat hub software on an Intel quad core with 64GB of memory? Goodbye slow downs!

bcopeland · January 4, 2020, 2:44am

While that sounds great.. If they open-sourced where would their income come from to support development.. A better idea would be for them to make a “pro” line that would feature a beefier cpu / ram / etc... I would gladly pay a premium for a more powerful system when I need more power.

csteele · January 4, 2020, 4:12am

64gb is 8 times the current memory, which delays the inevitable by 8 times. Crash in a day? Instead crash in 8 days. It's better, significantly better, but far from 'good enough'. What's good enough for me is 2 years.

I've virtually* stopped configuring my hubs. No new devices in a couple months, No new rules in almost the same time frame. Because I'm "Done" -- they are doing all that I want and I wants no mores.

Virtually stopped* == I refresh HubConnect quite often, but that's an App that is already installed. Occasionally a report of a problem surfaces and I will try and reconfig my development hubs to track it down.

My point is, I am at the point where I can imagine not touching my Production hubs for a year or more. So far, imagine only, but it's not so very far away (I hope.) I'd like the 3 hubs to stay up the entire time, no reboots. I have a fair chance because so far, I don't see the slowdowns and certainly never a crash like that which is reported. Luck, not skill. Unless there's a skill in splitting my system into three hubs. And since I'm not the only one with three hubs, I'm back to Luck not skill.

Therefore, 64gb of memory (or faster processors, or more cores) is not where I'm pinning my hopes for stability. It's true that there's no such thing as too much: memory, cpu, gigs, but for what I'm asking of these hubs, they should be good for years. But right now, there's a FLAW in the platform. More likely there are 3-4 flaws that all yield the same symptom. Certainly we know that the 100/full problem does. There's evidence to suggest that the 'vanishing ethernet' (No response on port 80 or 8081) is a singular problem that will be tracked down. Largely because there's some hint that it's repeatable.

If it is repeatable, it's solvable. The biggest problem we face is.. the majority of "slow" problems aren't repeatable... Anyone got a repeatable way to get logs full of SQL errors? Because that's certainly another symptom that quickly leads to slow/crash.

aaiyar · January 4, 2020, 6:50am

Obviously, I don't much about SQL.

However @srwhite, who clearly knows a lot about database structure, has suggested that the platform move to a more fault tolerant database. While I know jack-all about databases, a long time ago, I decided to move all my nucleic acid sequence data into a MySQL database. Not a good decision for that kind of data.

srwhite · January 4, 2020, 7:00am

What didn’t work with it? I envision a lot of large blobs of data in that kind of information which is one area where MySQL struggles.

aaiyar · January 4, 2020, 7:07am

It worked well (more or less) when I was the person querying the database and retrieving data from it.

My lab has never been large. But when we had ~4-5 people all querying the database (or adding data), we had repeated failures. Database crashes, or incomplete data retrieval (time outs). I had two choices - find an alternative, or put a copy on each person's Mac and sync them by some means. I chose the latter.

srwhite · January 4, 2020, 7:19am

The table engine makes all of the difference in MySQL..

I also misspoke.. In the other post I stated that the hub used SQLite.. That was incorrect.. It’s actually using an H2 database. I’ve no experience with this particular one to gauge whether its good or not.. I did find numerous discussions on corruption occurring from abrupt power loss or computer crashes.

bcopeland · January 4, 2020, 2:37pm

So far, from what I’m reading, it seems I’ve been lucky... Any problems I have had have been able to be traced back to a misbehaving device or bad code...

I do regular maintenance (z-wave repair every time I add or move a powered z-wave device.. Check logs periodically for errors.. Etc)..

I have only had to reboot my hub when zigbee devices were misbehaving.. But since I replaced all my zigbee bulbs with z-wave bulbs it’s been rock solid.. And I am only on 1 hub right now...