GPU utilization

I’m trying to understand how GPU utilization is defined and interpreted.

According to NVIDIA support, GPU utilization is:
“Percent of time over the past sample period during which one or more kernels was executing on the GPU. The sample period may be between 1 second and 1/6 second depending on the product.”

On my RTX 5080, for various deep learning workloads (small and large models), I consistently see GPU utilization from nvidia-smi and dcgmi dmon -e 203 fluctuate between ~0% and ~100%, almost like a binary. My understanding is that this happens because the metric only checks whether at least one kernel is running during the sampling window.

I have two questions:

  1. Is my interpretation correct?

  2. Is there a more meaningful GPU utilization metric that reflects actual compute usage, e.g., Utilization = (achieved compute throughput) / (peak compute throughput)?

Moved from Nsight Compute to nvidia-smi.

nvidia-smi and DCGM -e 203

collects

DCGM_FI_DEV_GPU_UTIL 203

GPU Utilization.

which is, for a system administrator, defined as GPU utilization when in fact it is a high level version of graphics engine (compute or graphics pipe active). The compute pipe can be 100% active if even a single thread is running on 1 SM (out of all SMs).

There are several more accurate methods of measuring GPU utilization:

  1. DCGM Profiling Metrics
    PROF_GR_ENGINE_ACTIVE is number of cycles in the sample period that the compute pipe or graphics pipe is active so it closely matches the DCGM_FI_DEV_GPU_UTIL in definition.
    PROF_SM_ACTIVE is the number of cycles in the sample period the SM had at least 1 warp active summed across all SMs.
    These metrics can be collect at up to 10 Hz.

  2. Nsight Systems GPU Metrics
    GR_ACTIVE matches definition of PROF_GR_ENGINE_ACTIVE
    SM_ACTIVE matches definition of PROF_SM_ACTIVE
    These metrics can be collected up to 200 kHz. On some GPUs this rate cannot be maintained.

Both DCGM Profiling and Nsight Systems support additional SM and memory system metrics to utilization of major SM pipelines (FP32, FP64, TENSOR, …), the memory subsystem, and interconnects.