I’m trying to understand how GPU utilization is defined and interpreted.
According to NVIDIA support, GPU utilization is:
“Percent of time over the past sample period during which one or more kernels was executing on the GPU. The sample period may be between 1 second and 1/6 second depending on the product.”
On my RTX 5080, for various deep learning workloads (small and large models), I consistently see GPU utilization from nvidia-smi and dcgmi dmon -e 203 fluctuate between ~0% and ~100%, almost like a binary. My understanding is that this happens because the metric only checks whether at least one kernel is running during the sampling window.
I have two questions:
Is my interpretation correct?
Is there a more meaningful GPU utilization metric that reflects actual compute usage, e.g., Utilization = (achieved compute throughput) / (peak compute throughput)?
which is, for a system administrator, defined as GPU utilization when in fact it is a high level version of graphics engine (compute or graphics pipe active). The compute pipe can be 100% active if even a single thread is running on 1 SM (out of all SMs).
There are several more accurate methods of measuring GPU utilization:
DCGM Profiling Metrics PROF_GR_ENGINE_ACTIVE is number of cycles in the sample period that the compute pipe or graphics pipe is active so it closely matches the DCGM_FI_DEV_GPU_UTIL in definition. PROF_SM_ACTIVE is the number of cycles in the sample period the SM had at least 1 warp active summed across all SMs.
These metrics can be collect at up to 10 Hz.
Nsight Systems GPU Metrics GR_ACTIVE matches definition of PROF_GR_ENGINE_ACTIVE SM_ACTIVE matches definition of PROF_SM_ACTIVE
These metrics can be collected up to 200 kHz. On some GPUs this rate cannot be maintained.
Both DCGM Profiling and Nsight Systems support additional SM and memory system metrics to utilization of major SM pipelines (FP32, FP64, TENSOR, …), the memory subsystem, and interconnects.