Deriving CPU Utilization from Exported Metrics

Howdy,

I’m trying to set up alarms in the event our machines hit high CPU. After setting up an integration with Grafana I see two metrics:
render_service_cpu_time_seconds
render_service_cpu_limit_seconds

If I wanted to set an alarm if my machine hits 85% CPU I would’ve thought taking

cpu_time / cpu_limit

Would’ve done the trick; but not really.

So my questions are

  1. What unit is cpu_limit_seconds? Is that seconds per minute, per hour?
  2. If the node is multiple cores, how does that impact cpu_time & cpu_limit? e.g. I would expect a 4 core system with one core pegged at 100% would be (0.25)=(1X/4X).
  3. I assume cpu_seconds ticks up for every second a core on the machine is busy.

Thank you