Performance Monitor

potts.mike · November 7, 2018, 12:56pm

Has there been any thought given to creating a performance monitor for the hub? An app that shows cpu usage, memory utilization, running processes etc. could prove useful for debugging purposes.

Ryan780 · November 9, 2018, 7:15am

How bout logging of what app caused a particular event! That would be REALLY helpful in debugging.

andrew.rowbottom · February 16, 2019, 11:42am

I know this is an old thread, but as a professional Performance /tuning person I find the lack of any "performance" stats really awkward.

I've recently got my hub and after adding a whole bunch of integrations it seems slow recently..
But there seems to be no easy way of finding a cause!

arnb · February 23, 2019, 5:28pm

Start out simple with a cpu utilization by system in total, and by app and device. This would be very helpful.

andrew.rowbottom · February 25, 2019, 2:11pm

Yup!

destructure00 · February 25, 2019, 2:22pm

This has been asked and answered many times. Here's the best explanation I've seen about why this info, if made available, would not be useful.

If you want more info, search for threads on slow and/or crashed hub, resource monitor requests, etc. There's plenty to choose from, and it's the same request and same answer each time.

andrew.rowbottom · February 26, 2019, 7:45am

Understood..
Apart from the desire to monitor nearly everything that my job gives me, I'm just grumpy because my new hub lost its database after a few days of running slow, many long (20+ sec) pauses).
With nothing in the logs to give me a clue as to which of the (way too) many apps and devices might have been causing the slowdowns (NO WebCore), it sparked an itch.

JasonJoelOld · February 26, 2019, 1:56pm

I also wish there was a more data driven way to troubleshoot slowdowns. Trial and error kind of sucks on a production hub with a ton of things going on. Yes, you start with "what was the last thing(s) I did before it broke.", but that doesn't always help.

In any case, I'm sure the Hubitat team would expose some diagnostic info if it were useful. As they've said a few times before, because of the way the java code works it is difficult to expose meaningful status data (my words, paraphrasing theirs).

I don't think for a second they are withholding valuable information. If it were easy to expose, even if only in some 'advanced' menu, I'm sure they would.

jpoeppelman1 · October 18, 2019, 11:51am

Any update on this? We need basic performance monitoring for each app instance (parent/child apps & composite parent/child devices). We need to know when a specific app or device is slowing down the entire hub.

eaton.blumenstein · February 3, 2020, 6:04pm

I agree, I would like to know why at the moment that I have to reboot my hub every day or so.

At this point i just have it rebooting itself. but that is not what I want. not every day atleast.. Knowing what I am doing wrong would help a lot!

jpoeppelman1 · February 6, 2020, 12:10pm

Any updates regarding better monitoring tools?

lpakula · February 19, 2020, 5:48pm

I'd love to see something as well. I'm finding that it works flawlessly for about 3 days, and on day 4 then there is a 3 second delay between a motion trigger and a light turning on. Restart the hub, and it's flawless for about 3 days again.

I've had issues changing the large quantity of my RM4 rules for all my light switches. At the beginning all is good, but by the time I hit rule #20, it starts to lag to the point that each change takes 3 seconds to make each change.

In both situations, it just feels like the hub is out of RAM (memory leak, fragmented, whatever). Only a restart "fixes" it.

I made a rule to restart it twice a week at 4AM as a "fix", but it still is a bandaid. Catch22 is during a restart (even a database export), you lose all the past history of an existing rule (it gets reset).

jwetzel1492 · February 19, 2020, 6:20pm

I've read through @bravenel's responses on this before, and I agree that a simple top-down profiler seems unlikely to provide useful info, due to the architecture of the system. It seems like a bottom-up approach could work, though it would have to be coded into individual components. For example, I could put profiling into my own app code, and let my apps profile themselves.

Unfortunately, I imagine this would be very hard or near impossible to do systematically. (And what I mean by that is, having the Hubitat firmware/framework systematically profile all apps from outside them.) Since the system is event driven, it's not like an app is a runaway process in Windows consuming 100% of cpu. How do we define our metrics even?

Just brainstorming as a software engineer here:

Could Hubitat framework profile the "instantiation" time for apps and drivers? Maybe a problem app is just a giant blob of groovy that takes a while for the interpreter to work through each time an event fires.
Could Hubitat profile the rate at which an app or driver is being instantiated and receiving events? My guess would be that a significant number of the issues people have are due to code/devices that are DOS'ing the hub.
Could Hubitat profile outgoing synchronous http calls? Some reporting along the line of "Your custom app ABC is making synchronous calls at this high rate X/minute. The average response time is Y seconds. Other systems that are being put on hold because of this include: etc, etc..."
For the case of Rule Machine specifically, could rules profile themselves? Some warning like "When this rule is triggered, it is taking an average of X seconds to fully run. It is being triggered Y times/minute."

My hub doesn't have slowdown issues, but the only custom code I run is my own. So if something goes wrong, I look straight at my last change. And I apply the KISS principle liberally in my RM4 rules. So I'm just a satisfied but brainstorming customer here.

lpakula · February 20, 2020, 1:09am

As a software engineer as well, I'd love to see something that would help troubleshoot this. I have a lot of rules and a lot of devices. Disabling one, waiting 3 days for a failure, and then repeating would take a year to evaluate.

It would be nice to know if there is a particular device, rule, app, queue, etc. that I could target for debugging. Right now I'm 100% blind on trying to find root cause. Some sort of debug metric would be absolutely invaluable.

ChubChub · February 27, 2020, 5:13pm

This seems easy enough to implement, and I believe this would help with finding a renegade rule / app / etc; it's how I basically track every VBA script I make for processing speed.

I assume the Hubitat hub processes rules linearly, meaning when it gets a trigger, it processes rules in order of when it got the trigger. If the Hubitat hub would just, on ANY trigger, mark the time it started, then when the rule ends, mark the end time, and give you the option to have it show up in the Event viewer as a "processing time = blah".

Obviously exposing the output of TOP or whatever would be amazing as well; currently using a smart plug or Kill-A-Watt to monitor the CPU usage, which is so kludgey, and doesn't give me any IO/RAM/etc data, which would definitely help.

jacobgraf · May 19, 2020, 11:56am

This. I got my Hub a few weeks ago and began configuring it. It's working great with multiple apps and 70+ devices, but the UI just feels like it's getting slower and slower and there is no way to find out why. Is the CPU maxed out? Is it out of RAM? Which apps are using the most resources (e.g. Linux Top). I'd love to be able to see some of those details. Thanks!

JasonJoel · May 19, 2020, 12:58pm

Obviously top wouldn't help in this case, as it is all just a Java blob .. but I know what your saying.

It would be nice to see more of what is happening in the java blob.

eric10 · October 29, 2020, 2:50pm

Just CPU % and Memory % would be amazing. Am I missing where this is at? I'm told the Zigbee Radio dies (it did for me last night, only a pulling of the power revived it, reboot did not work) if memory is taxed out. Would be nice to watch memory while I'm troubleshooting rogue apps.