How to get real-time SM occupancy


As shown above, when I make inference in pytorch and measure SM occupancy using nvidia-smi --query-gpu=utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1 -i 0 , I get following output:

That is: GPU utils is 100%
However , when I profiled such code using ncu and parse ncu-rep file like this : ncu --metrics sm__throughput.avg.pct_of_peak_sustained_elapsed,gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed --page raw --csv --import model.ncu-rep I got SM occupancy for each kernel as following:

SM occupancies from ncu-rep files are not always consistent with nvidia-smi outputs and are smaller than 100%. I was confused and wondered how I could get real-time SM occupancy ?Thanks in advance

nvidia-smi will return 100% if there is any work in the compute engine (kernel with single thread).
Nsight Compute (NCU) has counters per Streaming Multiprocessor (SM) that can count cycles that SM has any work sm__cycles_active.avg.pct_of_peak_sustained_elapsed or warp occupancy sm__warps_active.avg.pct_of_peak_sustained_elapsed.

Nsight Systems can capture both of these metrics at very high sampling rates.

On data center GPUs the stats can also be collected via DCGM (Data Center GPU Monitoring). The default sampling rate is 1 Hz.