nvidia-smi
gives volatile GPU util. which is useful if you want to know if the GPU is being used or not. It gives the amount of time a kernel was running on the GPU during a sampling interval.
nvidia-smi dmon
gives an sm%
. I do not understand what it means exactly.
# gpu pwr temp sm mem enc dec mclk pclk
# Idx W C % % % % MHz MHz
0 43 48 0 1 0 0 3505 936
0 43 48 0 1 0 0 3505 936
I was hoping to know how this sm% number was calculated.
NVML, on the other hand, gives the current and max sm clock frequency. Is there a way to get the SM utilization percentage similar to CPU utilization% to measure how well a program uses the SM cores.
Also, how does one go about monitoring the SM occupancy?
From my understanding of Nvidia GPUs, it looks like to fully maximize utilization, one needs to occupy all the SMs and run the maximum number of instructions possible.
Can the NVML API be used to get this information?