How to monitor SM utilization and SM occupancy?

Noname.no · May 12, 2021, 3:05am

nvidia-smigives volatile GPU util. which is useful if you want to know if the GPU is being used or not. It gives the amount of time a kernel was running on the GPU during a sampling interval.
nvidia-smi dmon gives an sm%. I do not understand what it means exactly.

# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
    0    43    48     0     1     0     0  3505   936
    0    43    48     0     1     0     0  3505   936

I was hoping to know how this sm% number was calculated.

NVML, on the other hand, gives the current and max sm clock frequency. Is there a way to get the SM utilization percentage similar to CPU utilization% to measure how well a program uses the SM cores.

Also, how does one go about monitoring the SM occupancy?

From my understanding of Nvidia GPUs, it looks like to fully maximize utilization, one needs to occupy all the SMs and run the maximum number of instructions possible.
Can the NVML API be used to get this information?

wychen2018 · October 7, 2021, 6:52am

Hi, have you solved this problem? I also want to understand this sm(%) meaning but it seems that there is no document on it.

BrentS · November 5, 2021, 2:34pm

You can do this with Data Center GPU Manager (DCGM)

Download from here:

skps23 · April 3, 2023, 2:17pm

Kindly answer what sm means as others asked. Thanks.

samyak.k.gupta · April 10, 2023, 3:53am

Hi,

The python bindings for DCHM appear to be broken. Is there anywhere I can report bugs?

subramaniyan.4 · April 27, 2023, 3:52am

Hi, I am facing the same issue. I am not able to get

utilization for my individual process. Any ideas on how to tackle this?

1033234617 · January 11, 2024, 1:41pm

Hi,
Through NVML API has already support SM active and SM occupy in Hopper architecture. I have same problem with you in Ampere architecture. Anyone who provides help is very grateful！

nvml GPM API: NVML API Reference Guide :: GPU Deployment and Management Documentation

thanks.

chelo.canino · January 12, 2024, 7:52pm

testing on where this is located

Topic		Replies	Views
Questions on per-process GPU utilization System Management and Monitoring (NVML)	6	2559	October 30, 2023
How to monitor the GPU hardware resources usage state at any time? CUDA Programming and Performance	3	1222	April 4, 2019
Measure SM utilization per process System Management and Monitoring (NVML)	1	1287	January 11, 2024
DCGM_FI_PROF_SM_ACTIVE is showing a value higher than 100% for MIG devices System Management and Monitoring (NVML)	0	486	February 15, 2024
GPU utilization DGX User Forum	8	6689	August 21, 2019
Question: NVML utilization System Management and Monitoring (NVML) driver , nvidia-smi , nvml	0	211	July 4, 2024
Questions about nvidia-smi CUDA Programming and Performance	2	2056	February 23, 2011
What does nvmlDeviceGetComputeRunningProcesses get? System Management and Monitoring (NVML)	0	58	February 20, 2025
NVML overhead CUDA Programming and Performance	6	2014	March 24, 2020
memory metric reported by NVML and nvdia-smi seems to differ System Management and Monitoring (NVML)	3	1382	May 26, 2019

How to monitor SM utilization and SM occupancy?

Related topics