Howdy,
I’m trying to set up alarms in the event our machines hit high CPU. After setting up an integration with Grafana I see two metrics:
render_service_cpu_time_seconds
render_service_cpu_limit_seconds
If I wanted to set an alarm if my machine hits 85% CPU I would’ve thought taking
cpu_time / cpu_limit
Would’ve done the trick; but not really.
So my questions are
- What unit is cpu_limit_seconds? Is that seconds per minute, per hour?
- If the node is multiple cores, how does that impact cpu_time & cpu_limit? e.g. I would expect a 4 core system with one core pegged at 100% would be (0.25)=(1X/4X).
- I assume cpu_seconds ticks up for every second a core on the machine is busy.
Thank you