Some items that I have run into:
I added power monitoring and it is reporting a lot how much headroom do I have?
Who could know? even a basic load average would let me know rough impact of configuration changes.
The “make a change and see if it broke” is a debugging method but it won’t let you know about non-breaking impacts. So I can’t tell if this change will mean some future small change will break it. I agree that a per driver/app metric would be great.
On the “single process” issue: is there any place in the docs that describe what execution environment drivers and apps are in? Is this using asynchronous processing and drivers are blocking core execution while they run? Or is there some number of threads?
We could atleast see the thread listing and per thread cpu use if it is a thread system.
If it is not thread based perhaps a latency metric for each driver/app when it has execution.
Another easy java option would be to turn on Jager github jaegertracing/jaeger-client-java and let us optionally get that tracing information.