Questions on per-process GPU utilization

King_Crimson · September 5, 2023, 1:38pm

Question 1: According to the documentation, nvmlDeviceGetProcessUtilization reports per-process utilization (including GPU/memory/encoder/decoder utilization values in unit of percent). However, our unit test shows some confusing results.

Suppose within a sampling period of 1 second (which applies to the 2080 SUPER GPU in our test), process 0 runs a kernel that lasts 0.2 second, and process 1 runs a kernel that lasts 0.3 second. When either of these processes runs exclusively as a single resident process on the GPU, we can observe 20% GPU utilization for process 0 and 30% GPU utilization for process 1, reported correctly either by the per-process API nvmlDeviceGetProcessUtilization or by the per-device API nvmlDeviceGetUtilizationRates .

However, when process 0 and 1 run concurrently on the GPU within a sampling period, an oddity occurs: nvmlDeviceGetProcessUtilization would report that either process 0 or 1 has a 50% GPU utilization, whereas the other process 0% GPU utilization. This happens whether the two kernels from two processes overlap within the sampling period (in which case the GPU context switch will occur, dilating the execution time of both kernels), or they don’t.

We even use nvidia-smi pmon as a reference and observe similar phenomenon (below), where one process is assigned a value of 50%, and the other not available (-)

# gpu        pid  type    sm   mem   enc   dec   command
# Idx          #   C/G     %     %     %     %   name
    0          -     -     -     -     -     -   -
    0      76715     C    17     0     -     -   dummy-proc-0
    0      76716     C     -     -     -     -   dummy-proc-1
    0      76715     C    33     0     -     -   dummy-proc-0
    0      76716     C     -     -     -     -   dummy-proc-1
    0      76715     C    50     0     -     -   dummy-proc-0
    0      76716     C     -     -     -     -   dummy-proc-1
    0      76715     C    50     0     -     -   dummy-proc-0
    0      76716     C     -     -     -     -   dummy-proc-1
    0      76715     C    50     0     -     -   dummy-proc-0
    0      76716     C     -     -     -     -   dummy-proc-1
    0      76715     C    50     0     -     -   dummy-proc-0

We have replicated this on 2080 Super, 3080 Ti GPUs using CUDA runtime/driver API 11.8 and 12.2.

So the question is: can nvmlDeviceGetProcessUtilization be used to query per-process utilization at all? What is the correct way to monitor the per-process utilization in percent?

Question 2: In the documentation, nvmlDeviceGetProcessUtilization has been listed in section 2.22. vGPU APIs. Does it mean, for some reason, that this function only applies to the virtual GPU (which we are not using at all)?

Question 3: Both nvidia-smi pmon and nvmlProcessUtilizationSample_t use the terminology “smUtil”. Does this simply refer to the GPU utilization as in nvmlUtilization_t ?

As far as I know, the term SM utilization only applies to Hopper GPUs with GPM support, and its meaning is “percentage of SMs that were busy” according to this as opposed to “percent of time over the past sample period during which one or more kernels was executing on the GPU” as in GPU utilization. So we would like to have some clarifications from NVML developers.

Thanks!

BlueGoliath · September 26, 2023, 6:37am

Per GPU process utilization is not accurate.
That documentation is wrong.
They mean graphics utilization here. See https://github.com/NVIDIA/nvidia-settings/blob/1e202d03434e698e70a6fd2264d07fcf6c68ffa1/src/nvml.h#L1379

King_Crimson · September 26, 2023, 6:47am

They mean graphics utilization here.

Could you elaborate a bit on this part? The comment says “SM (3D/Compute) Util Value”. Doesn’t it indicate the contributions from both the compute and the graphics?

BlueGoliath · September 26, 2023, 9:50pm

Either or. Nvidia’s documentation is all over the place. on nvmlUtilization_t struct or w/e it says something about running a CUDA kernel but the utilization value is also affected by 3D applications(e.g. games).

I have a JavaFX-based app using NVML you can more easily observe this here: GitHub - BlueGoliath/Envious-FX: A JavaFX based Nvidia GPU monitoring utility using the native NVML library. Graphics and compute both make up “graphics” utilization as shown by the “Graphics Utilization” tiles and monitors.

msakthivel · October 10, 2023, 5:27pm

Process utilization is calculated only for a single running process in GPU. Currently it is not supported for concurrent running processes.

BlueGoliath · October 16, 2023, 8:31pm

So what are the other process values? Garbage data? Does it have any meaning at all? Is there any chance of this getting fixed?

system · October 30, 2023, 8:31pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to monitor SM utilization and SM occupancy? System Management and Monitoring (NVML)	7	10250	January 12, 2024
Question: NVML utilization System Management and Monitoring (NVML) driver , nvidia-smi , nvml	0	177	July 4, 2024
per-process resource accounting CUDA Programming and Performance	2	2693	December 22, 2022
GPU utilization DGX User Forum	8	6478	August 21, 2019
NVML overhead CUDA Programming and Performance	6	1940	March 24, 2020
Measure SM utilization per process System Management and Monitoring (NVML)	1	1196	January 11, 2024
Why the utilization from kernel activity records is not equal to GPU Utilization? CUPTI – CUDA Profiler Tools Interface kernel	1	405	June 26, 2024
NVML Process Utilization & Encoder Capacity System Management and Monitoring (NVML)	0	1269	October 16, 2020
How can I tell whether an EXCLUSIVE_PROCESS-mode GPU is "taken" or not? CUDA Programming and Performance cuda , nvidia-smi , nvml	7	1541	November 22, 2023
`nvmlDeviceGetComputeRunningProcesses_v3()` inside container reporting invalid memory usage number System Management and Monitoring (NVML)	0	253	May 30, 2024

Questions on per-process GPU utilization

Related topics