GPU application clock and warming up in a CUDA application

For warming up a GPU, I wrote a simple function that multiplies two arrays on a GPU. I designed the function to use all available threads. The nvidia-smi dmon -s u command shows that SMs are indeed utilized to 99-100%.

When GPU performs the calculations, its performance state changes to P0 and application clock frequency rises. However, on T4 GPU I cannot get the application clock to reach more than about 90% of the max by repeatedly executing the function on the GPU.

What could be the reason for a GPU not achieving the max frequency on calculations that supposedly use all available processors?

In the context of NVIDIA GPUs, “application clock” has a very specific meaning. It looks like here we are actually talking about “GPU clock” as dynamically adjusted by the GPU’s power management and clock boosting heuristics.

It is not clear what “max” refers to. Every GPU has a nominal operating frequency, and if environmental factors allow, and with GPU in power state P0, the GPU may be boosted above this level up to some maximum boost clock. What is the maximum GPU clock achieved by the T4 in this case? I have never used a T4 but some other Turing based cards can boost to 1800+ MHz, but in normal operation are more likely to run at 1500-1600 MHz.

On modern GPUs, it is rare that the maximum boost clock is sustained for longer than a brief moment. The most common limiters are power consumption and temperature.

The power management of NVIDIA reacts quite quickly to reduce clock boost if the nominal power limit is exceeded, although very brief excursions above the limit may occur (e.g. one might see 78W on a GPU with a power limit of 75W). Often boost clock is being reduced before the power limit is reached. For example, on one of my GPUs with a power limit of 75W, that appears to happen as soon as the power exceeds 65W.

The temperature limit is different based on GPU, typically when temperature reaches 83 to 85 degrees Celsius a significant boost clock reduction occurs, and generally any GPU temperature above 60 degrees Celsius or so negatively impacts achievable boost clock. Some people install very elaborate third-party cooling solutions on their GPUs for this reason. In the winter, I sometime accelerate my GPUs by dropping ambient temperature to about 10 degrees Celsius by means of cracking open the window in my office.

A third reason, more rarely encountered, has to do with GPU voltage. In order to boost the GPU clock reliably, an increase in operating voltage needs to be applied. If the maximum voltage supported (usually around 1 volt) is reached, or if voltage levels become unstable before that, boost clock will be reduced.

1 Like

look at “clocks throttle reasons” in nvidia-smi output when your application is running.

T4 is a GPU that is limited to 75W, so you may be hitting the power limit.

If you are hitting the temperature limit you should take that issue up with your system vendor.

1 Like

You can find the behaviour outlined by Norbert above, specifically pertaining to the T4 here, in section 4.5:

1 Like

I don’t know about the T4 cooling solution, but in general I find that over time dust adheres to CPU and GPU heatsink fins noticeably reducing the effectiveness of the heatsinks. My computers run pretty much 24/7 and I blow out the heatsinks on my equipment once a year or so. For GPUs this may require temporary removal of any shroud surrounding the fins which may or may not void any warranty.

1 Like

Thank you all for your valuable comments!

I have checked active throttles with nvidia-smi -q -d PERFORMANCE. It was Power Cap. And this is in line with the paper observations.

I observed similar behavior on Amazon (AWS) and Google (GCP) cloud T4 instances.