How to monitor SM utilization and SM occupancy?

nvidia-smigives volatile GPU util. which is useful if you want to know if the GPU is being used or not. It gives the amount of time a kernel was running on the GPU during a sampling interval.
nvidia-smi dmon gives an sm%. I do not understand what it means exactly.

# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
    0    43    48     0     1     0     0  3505   936
    0    43    48     0     1     0     0  3505   936

I was hoping to know how this sm% number was calculated.

NVML, on the other hand, gives the current and max sm clock frequency. Is there a way to get the SM utilization percentage similar to CPU utilization% to measure how well a program uses the SM cores.

Also, how does one go about monitoring the SM occupancy?

From my understanding of Nvidia GPUs, it looks like to fully maximize utilization, one needs to occupy all the SMs and run the maximum number of instructions possible.
Can the NVML API be used to get this information?

5 Likes

Hi, have you solved this problem? I also want to understand this sm(%) meaning but it seems that there is no document on it.

1 Like

You can do this with Data Center GPU Manager (DCGM)

Download from here:

Kindly answer what sm means as others asked. Thanks.

Hi,

The python bindings for DCHM appear to be broken. Is there anywhere I can report bugs?

Hi, I am facing the same issue. I am not able to get


utilization for my individual process. Any ideas on how to tackle this?

Hi,
Through NVML API has already support SM active and SM occupy in Hopper architecture. I have same problem with you in Ampere architecture. Anyone who provides help is very grateful!

nvml GPM API: NVML API Reference Guide :: GPU Deployment and Management Documentation

thanks.

1 Like

testing on where this is located