I have been playing with the Rule Machine the past couple of weeks. I am slowly getting better. A couple times I created poorly written rules that put a big load on the hub. I was able to determine this by the slow response from the hub.
It would be a great help if there was a way to determine the hub system load, similar to what Linux or windows provides so that I can see the load after I write a new rule.
I don't use RM but you could probably setup rule where you store a start time and then do a loop for maybe a 1000 and stuff a variable at each iteration once the loop completes you could compare the start time vs a stop time to see how long it takes. Once you figure out how long it takes when the hub is freshly booted you could compare it to the time it takes after the hub has been running over several days. My automation does this once per day at a specific time (4 AM) and if the execution time is excessively long I call a device driver named "HTTP Momentary Switch" to reboot the hub.
It doesn't tell you what the hub performance is but rather a comparison from a fresh reboot vs when the hub is running slow.
Many people are having hub slowdowns after the hub has been running for several days, mine included.
There are some very clear responses in their from staff. Basically, this is not a good indicator of hub performance and they have no plan to implement any such metrics.
What I find odd is, they should basically be free from the JVM. I have the same question, I've added ~50 some zwave devices, and a dozen zigbee devices and a couple dozen rules. How busy is the hub? Dunno. Really shouldn't be given things are happening on 1s/10s/1 min type time frames...but?
Seems the JVM and system are already keeping 3 important stats:
CPU stats
JVM Young Collection Count (or time)
JVM Old collection count (or time)
If you're churning through memory and causing a lot of old, or stop the world garbage collections that will cause rules to run sporadically and slowly as the JVM reclaims ram.
Ideally you shouldn't have to expose them, but given we're writing custom rules, adding plug in apps/drivers, it would sure seem helpful.
Shame tho, if we (coomunity) had that info I'm fairly certain we'd have isolated the cause of the many slowdowns customers are experiencing. Granted there may be various reasons, but seems many solve their slowdowns with reboots, which is just a "bandaid" , not a resolution.
Not really -- any clue would be in the processes/programs running that could be seen via top or htop. Without that all the user would know is the average loads.
I added power monitoring and it is reporting a lot how much headroom do I have?
Who could know? even a basic load average would let me know rough impact of configuration changes.
The “make a change and see if it broke” is a debugging method but it won’t let you know about non-breaking impacts. So I can’t tell if this change will mean some future small change will break it. I agree that a per driver/app metric would be great.
On the “single process” issue: is there any place in the docs that describe what execution environment drivers and apps are in? Is this using asynchronous processing and drivers are blocking core execution while they run? Or is there some number of threads?
We could atleast see the thread listing and per thread cpu use if it is a thread system.
If it is not thread based perhaps a latency metric for each driver/app when it has execution.
Another easy java option would be to turn on Jager github jaegertracing/jaeger-client-java and let us optionally get that tracing information.
Well that's a great pity, since the use I would have made of it would be to check each time I add an app to see if any of them are being particularly greedy with a view to knowing what to get rid of first if there is a performance or space problem.