GPU utilization

Robert_Crovella · August 21, 2019, 7:52pm

The tools that I’m aware of that approach those topics are the profilers. They are not readily adaptable for cluster scale monitoring. Perhaps Scott will have some other suggestions. At a higher level, some of these tools may be of interest, e.g. ganglia:

[url]https://developer.nvidia.com/cluster-management[/url]

From my perspective, asking questions about warp behavior is something like asking about whether or not the AVX512 intrinsics I am using are actually utilizing every AVX lane.

That seems (to me) like rather more detail than is necessary to answer these questions:

“how busy are these (GPU) servers? do we need to get more capacity?”

From my perspective, the first level of monitoring is simply process monitoring:

Is a process using the GPU or not? Is the GPU currently claimed by a process?

The next level of monitoring would be GPU utilization within the process:

is the process allocating memory on the GPU? what percentage of total?
when the process is using a GPU, how often are CUDA kernels being run during that time?

All of these levels of monitoring or question answering are supported by nvidia-smi

From my perspective, there are 2 different kinds of monitoring:

How much activity is there on the GPUs?
What is the quality (nature) of the activity on the GPUs?

To meet capacity demand on a near term basis, only item 1 is important (I think). If someone is using a GPU, for most use cases I am aware of, no one else can or should be using that GPU. It doesn’t matter much what sort of activity is going on.

Item 2 comes into play when datacenter management wants to encourage their users to make more effective use of the GPU cycles they are consuming already. It does not fundamentally address the capacity question, except on a long term basis as users are encouraged to run more efficient codes.

Topic		Replies	Views
Any other ways to check GPU utilization rather than `nvidia-smi`? AI for Media hw , cuda	0	675	June 13, 2023
showing gpu utlization per process CUDA Programming and Performance	5	2322	October 12, 2018
Questions about nvidia-smi CUDA Programming and Performance	2	2147	February 23, 2011
How to monitor the GPU hardware resources usage state at any time? CUDA Programming and Performance	3	1409	April 4, 2019
How to monitor SM utilization and SM occupancy? System Management and Monitoring (NVML)	7	13168	January 12, 2024
Something like "top" to monitor the gpu? CUDA Programming and Performance	13	49774	December 5, 2021
GPU utilization for CUDA CUDA Programming and Performance	1	796	October 31, 2018
GPU utilization Other Tools nvidia-smi	1	75	May 4, 2026
Measuring the GPU Occupancy of Multi-stream Workloads Technical Blog	1	284	April 20, 2024
Monitor GPU usage via nvidia-smi on Thor Jetson Thor gpu	4	448	January 6, 2026

GPU utilization

Related topics