GPU application clock and warming up in a CUDA application

pyotr777 · October 12, 2021, 11:48am

For warming up a GPU, I wrote a simple function that multiplies two arrays on a GPU. I designed the function to use all available threads. The nvidia-smi dmon -s u command shows that SMs are indeed utilized to 99-100%.

When GPU performs the calculations, its performance state changes to P0 and application clock frequency rises. However, on T4 GPU I cannot get the application clock to reach more than about 90% of the max by repeatedly executing the function on the GPU.

What could be the reason for a GPU not achieving the max frequency on calculations that supposedly use all available processors?

njuffa · October 12, 2021, 12:23pm

In the context of NVIDIA GPUs, “application clock” has a very specific meaning. It looks like here we are actually talking about “GPU clock” as dynamically adjusted by the GPU’s power management and clock boosting heuristics.

It is not clear what “max” refers to. Every GPU has a nominal operating frequency, and if environmental factors allow, and with GPU in power state P0, the GPU may be boosted above this level up to some maximum boost clock. What is the maximum GPU clock achieved by the T4 in this case? I have never used a T4 but some other Turing based cards can boost to 1800+ MHz, but in normal operation are more likely to run at 1500-1600 MHz.

On modern GPUs, it is rare that the maximum boost clock is sustained for longer than a brief moment. The most common limiters are power consumption and temperature.

The power management of NVIDIA reacts quite quickly to reduce clock boost if the nominal power limit is exceeded, although very brief excursions above the limit may occur (e.g. one might see 78W on a GPU with a power limit of 75W). Often boost clock is being reduced before the power limit is reached. For example, on one of my GPUs with a power limit of 75W, that appears to happen as soon as the power exceeds 65W.

The temperature limit is different based on GPU, typically when temperature reaches 83 to 85 degrees Celsius a significant boost clock reduction occurs, and generally any GPU temperature above 60 degrees Celsius or so negatively impacts achievable boost clock. Some people install very elaborate third-party cooling solutions on their GPUs for this reason. In the winter, I sometime accelerate my GPUs by dropping ambient temperature to about 10 degrees Celsius by means of cracking open the window in my office.

A third reason, more rarely encountered, has to do with GPU voltage. In order to boost the GPU clock reliably, an increase in operating voltage needs to be applied. If the maximum voltage supported (usually around 1 volt) is reached, or if voltage levels become unstable before that, boost clock will be reduced.

Robert_Crovella · October 12, 2021, 6:21pm

look at “clocks throttle reasons” in nvidia-smi output when your application is running.

T4 is a GPU that is limited to 75W, so you may be hitting the power limit.

If you are hitting the temperature limit you should take that issue up with your system vendor.

rs277 · October 12, 2021, 6:21pm

You can find the behaviour outlined by Norbert above, specifically pertaining to the T4 here, in section 4.5:

njuffa · October 12, 2021, 6:37pm

I don’t know about the T4 cooling solution, but in general I find that over time dust adheres to CPU and GPU heatsink fins noticeably reducing the effectiveness of the heatsinks. My computers run pretty much 24/7 and I blow out the heatsinks on my equipment once a year or so. For GPUs this may require temporary removal of any shroud surrounding the fins which may or may not void any warranty.

pyotr777 · October 13, 2021, 6:16am

Thank you all for your valuable comments!

I have checked active throttles with nvidia-smi -q -d PERFORMANCE. It was Power Cap. And this is in line with the paper observations.

I observed similar behavior on Amazon (AWS) and Google (GCP) cloud T4 instances.

system · October 27, 2021, 6:17am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SM Clock on RTX A6000 never reaches max frequency CUDA Programming and Performance nvidia-smi	4	5243	February 18, 2022
Set GPU Clock Speed To Maximum CUDA Programming and Performance hw , cuda	1	2458	September 1, 2020
Nvidia-smi GPU T.Limit/ GPU Shutdown T.Limit Temp Drivers - Linux, Windows, MacOS	1	1431	January 16, 2025
GPU Temperature: Quadro RTX 8000 CUDA Programming and Performance cuda , gpu , rtx	4	1156	May 6, 2024
Performance state switches from P0 to P2 when starting program CUDA Programming and Performance cuda , python , linux	16	10718	October 3, 2024
Titan V boost-clock issue CUDA Programming and Performance	5	1788	December 28, 2017
Power throttling observed with GPU stress test that calls cublas CUDA Programming and Performance	11	1699	October 20, 2023
Nvidia-smi GPU target temperature / Maximum Operating Temperature Drivers - Linux, Windows, MacOS	4	10295	May 2, 2025
Get GPU Usage CUDA Programming and Performance	1	14949	February 3, 2013
GPU throttling? Video Processing & Optical Flow	1	767	November 18, 2019

GPU application clock and warming up in a CUDA application

Related topics