GPU utilization

Hi,

I have DGX-1 systems with multiple Tesla GPU installed(Linux), the tool “nvidia-smi” only seems to give very basic view of GPU utilization, other tools such as profilers would only give detail about how a particular program uses GPU, is there a general tool that would give GPU usage like various sysstat tools for CPU?

Thank you!

Tom

Hi Tom,

Do you mean something like what nvidia-smi dmon yields?

~$ nvidia-smi dmon
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    31    34     -     0     0     0     0   715   405
    1    32    34     -     0     0     0     0   715   405
    2    31    34     -     0     0     0     0   715   405
    3    31    35     -     0     0     0     0   715   405
    4    32    33     -     0     0     0     0   715   405
    5    31    35     -     0     0     0     0   715   405
    6    31    37     -     0     0     0     0   715   405
    7    31    33     -     0     0     0     0   715   405
    0    31    34     -     0     0     0     0   715   405
    1    32    33     -     0     0     0     0   715   405
    2    31    34     -     0     0     0     0   715   405
    3    31    35     -     0     0     0     0   715   405
    4    32    33     -     0     0     0     0   715   405
    5    31    35     -     0     0     0     0   715   405
    6    31    37     -     0     0     0     0   715   405
    7    31    33     -     0     0     0     0   715   405
    0    31    34     -     0     0     0     0   715   405
    1    32    34     -     0     0     0     0   715   405
    2    31    34     -     0     0     0     0   715   405
    3    31    35     -     0     0     0     0   715   405
    4    32    33     -     0     0     0     0   715   405
    5    31    35     -     0     0     0     0   715   405
    6    31    37     -     0     0     0     0   715   405
    7    31    33     -     0     0     0     0   715   405

Hi Scott,

here’s the excerpt from nvidia-smi man page

Utilization
Utilization rates report how busy each GPU is over time, and can be used to determine how
much an application is using the GPUs in the system.

   GPU            Percent of time over the past second during which one or more  kernels  was
                  executing on the GPU.

   Memory         Percent  of  time  over the past second during which global (device) memory
                  was being read or written.

i tried it myself, running a simple x*y+z program, while it runs it occupies a single core, and that core shows > 90% usage for the running duration, so it’s either 0% or close to 100%, there isn’t any meaningful usage pattern that i can use to tell how busy the GPU cores are

i’ve been capturing the number using nvidia-smi for a week, the graph would just be a block ranging from 0% to 100%, there isn’t any weighted average i can get out of, and as a result i can’t answer the question “how busy are these GPU servers? do we need to get more capacity?”

Thank you!

Tom

The GPU utilization number reported by nvidia-smi is a (one-second) time-average of asking the question “is a kernel currently running on the GPU?” repeatedly.

https://stackoverflow.com/questions/40937894/nvidia-smi-volatile-gpu-utilization-explanation/40938696#40938696

Of course that question only has 2 possible answers, yes or no. So on an instantaneous sampling, nvidia-smi could only report 0% or 100%.

However suppose an application runs a kernel for 1ms, then has no kernel activity for 1ms, then runs a kernel for 1ms, then no kernel activity, etc. nvidia-smi would report 50% utilization for that case.

For cluster-scale monitoring, it’s sometimes sufficient just to know that a process is using the GPU. nvidia-smi can report that. The utilization number is useful if you want to ensure that a process that is using the GPU is actually making “good” use of the GPU, i.e. it is running kernels with some regularity.

nvidia-smi also has additional reporting capabilities which may be relevant for cluster-scale monitoring:

nvidia-smi stats -h
nvidia-smi pmon -h

Thanks Robert for confirming what the tool nvidia-smi does, is there any other tools, that’s not based on nvidia-smi, that would go into detail how fully occupied a core is? like does every execution uses maximum number of warps, and every warp is fullly threaded?

maybe i should not expect GPU to behave like CPU, but the ultimate question i have to answer is, how busy are the GPU cores?

The tools that I’m aware of that approach those topics are the profilers. They are not readily adaptable for cluster scale monitoring. Perhaps Scott will have some other suggestions. At a higher level, some of these tools may be of interest, e.g. ganglia:

https://developer.nvidia.com/cluster-management

From my perspective, asking questions about warp behavior is something like asking about whether or not the AVX512 intrinsics I am using are actually utilizing every AVX lane.

That seems (to me) like rather more detail than is necessary to answer these questions:

“how busy are these (GPU) servers? do we need to get more capacity?”

From my perspective, the first level of monitoring is simply process monitoring:

  • Is a process using the GPU or not? Is the GPU currently claimed by a process?

The next level of monitoring would be GPU utilization within the process:

  • is the process allocating memory on the GPU? what percentage of total?
  • when the process is using a GPU, how often are CUDA kernels being run during that time?

All of these levels of monitoring or question answering are supported by nvidia-smi

From my perspective, there are 2 different kinds of monitoring:

  1. How much activity is there on the GPUs?
  2. What is the quality (nature) of the activity on the GPUs?

To meet capacity demand on a near term basis, only item 1 is important (I think). If someone is using a GPU, for most use cases I am aware of, no one else can or should be using that GPU. It doesn’t matter much what sort of activity is going on.

Item 2 comes into play when datacenter management wants to encourage their users to make more effective use of the GPU cycles they are consuming already. It does not fundamentally address the capacity question, except on a long term basis as users are encouraged to run more efficient codes.

To meet capacity demand on a near term basis, only item 1 is important (I think). If someone is using a GPU, for most use cases I am aware of, no one else can or should be using that GPU. It doesn't matter much what *sort* of activity is going on

i’m curious, so GPU doesn’t have any sort scheduling/run queue policy like CPU does?

I may not be understanding the question.

If you have a CPU core, and you assign that to a user e.g. using a job scheduler such as slurm, would you also assign that same CPU core to another job/user? You wouldn’t.

You wouldn’t do that with a GPU either.

Individual GPU cores don’t get scheduled the way an individual CPU core might. Most scheduling policies for GPUs involve assigning the entire GPU to a single user/job. The same is true for GPU memory. GPU memory for most compute deployments cannot be sliced up or individually assigned to separate users/jobs. The user of a GPU gets access to all of the memory associated with that GPU.

Note that I’m referring to the way things are currently, for the vast majority of GPU deployments. GPU virtualization/sharing/slicing is available in a limited way today, and the capability and usage of this will grow in the future. However almost certainly you are not using your GPUs that way.

Thank you Robert/Scott!