Runtime Stats

mavrrick58 · May 9, 2021, 5:52pm

Ok so I have been asking question regarding performance considerations. Part of my concerns is related to a few wifi based apps that i see consuming considerable CPU usage, or atleast i think i do based on "Runtime stats" for the associated devices. Most specifically it is using the Hubitat Ping Driver from @thebearmay to test if a network resource is up.

Just for giggles I setup 2 devices with that driver and set them to check every 5 seconds. This driver has a significant amount of time it is actually waiting for Hubitat to run the new ping API that was added with 2.2.7. With these two devices running pings every 5 seconds it appears based on the runtime stats that the box using over 100% of it's cpu time. So that doesn't make much sense

Now I also use @thebearmay Hub Info Driver to collect a few things for hub stats, one of which is the CPU load 5 min interval. After setting up that test earlier I decided to go check what it had collected. Well simply put it shows the CPU 5 min load as a much more reasonable number.

My guess is this is related to the runtime stats not tracking idle or wait time in some way. This just makes it hard to know how good any of the stats are. Reading the threads for the Hub info driver all of his math makes sense so i feel pretty confident with this drivers cpu usage number. The fact the Runtime stats show over 100% makes me question them.

thebearmay · May 9, 2021, 6:09pm

If the hub has alot of processes that are waiting for I-O or for async responses it is theoretically possible for the load to exceed 1.0 per cpu. So the load equation is something like

load = runningProcesses + awaitingIO + awaitingAsyncResponse

mavrrick58 · May 9, 2021, 6:34pm

Understood.

I actually trust the numbers from Hub Info driver allot more right now then I do internal statistics.

I am actually the performance SME for my company related to IBM i infrastructure. I manage several different performance tools on that platform. On that platform everything is measured by cpu states in milliseconds. This seems familiar to what their runtime stat page shows, or atleast how they are attempting calculating usage.

With that said when I look at the devices that are using the hub ping driver. they seem to have a average runtime of around 2.5-2.6 seconds. Most of that is idle time and doesn't seem to actually reflect in load. Runtime stats though show this as being time consumed and relates it to the total time avaliable with a % used. If that number is showing more time then 100% then clearly it isn't really possible you can't use more then 60 seconds of cpu in one minute.

gopher.ny · May 10, 2021, 12:05pm

The counter takes start time and end time of a method, subtracts the former from the latter, and takes it as a run time. It doesn't know that the method spends much of its time waiting for input. Running a proper profiler reduces hub performance by ~3x. Our approach, while flawed, has minimal performance impact.

I'll add some code to subtract waiting time for pings. There's already code like that for HTTP calls.

mavrrick58 · May 10, 2021, 3:16pm

Thanks for the update. That certainly explains it and I certainly understand the concern of impact from data collection. I look forward to testing the update to remove the wait time.