SM Clock on RTX A6000 never reaches max frequency

SM clock frequency on my RTX A6000 card cannot reach its maximum value.

I have a benchmarking program in C++ and CUDA. Before benchmarking I need to warm up the GPU, performing CUDA calculations until SM frequency goes up to 100% of the max value. To read GPU frequencies I use NVML library, and also manually check parameters with nvidia-smi.

The SM frequency on my RTX A6000 never achieves 100% of the Max SM clock frequency, it only goes up to about 92%. The reason seems to be in throttling: NVML reports the throttle value of 0x0000000000000004LL, which is described as SW Power Scaling algorithm is reducing the clocks below requested clocks here.

nvidia-smi (and NVML) report that the max SM frequency is 2100MHz.

$ nvidia-smi -q -d CLOCK

==============NVSMI LOG==============

Timestamp                                 : Fri Jan 28 08:58:51 2022
Driver Version                            : 470.42.01
CUDA Version                              : 11.4

Attached GPUs                             : 1
GPU 00000000:4B:00.0
    Clocks
        Graphics                          : 0 MHz
        SM                                : 0 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Default Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 8001 MHz
        Video                             : 1950 MHz
...

However, the max frequency that I can achieve by warming up the GPU with computations is only 1935MHz. Then the throttle kicks in and the frequency goes down.

Here is a fragment of my warming up program output:

I0128 09:07:20.779960    22 warmup.cu:161] Before: P2, smclock 85.7143%, 32˚C CLOCKS (graph,sm,mem,vid): 1800,1800,7600,1590
I0128 09:07:20.781229    22 warmup.cu:224] GPU NVIDIA RTX A6000, 84 SMs, 1536 Max threads per SM, 1024 max threads per block
I0128 09:07:20.781234    22 warmup.cu:233] Warmup parameters: N=258048 elements, 2 array elements per thread, 252 blocks x 1024 threads per block, elements/thread:2
1/100 clock 85.7143%, time 117.585ms CLOCKS (graph,sm,mem,vid): 1800,1800,7600,1590, temp: 35˚C, pwr: 73.68W, throttle: 0
2/100 clock 92.1429%, time 253.757ms CLOCKS (graph,sm,mem,vid): 1935,1935,7600,1695, temp: 36˚C, pwr: 99.392W, throttle: 5
3/100 clock 91.4286%, time 364.268ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 38˚C, pwr: 118.835W, throttle: 5
4/100 clock 92.8571%, time 473.866ms CLOCKS (graph,sm,mem,vid): 1950,1950,7600,1710, temp: 38˚C, pwr: 138.128W, throttle: 0
5/100 clock 91.4286%, time 585.882ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 39˚C, pwr: 157.558W, throttle: 5
6/100 clock 91.4286%, time 696.576ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 39˚C, pwr: 176.998W, throttle: 5
7/100 clock 92.1429%, time 807.031ms CLOCKS (graph,sm,mem,vid): 1935,1935,7600,1695, temp: 39˚C, pwr: 196.352W, throttle: 5
8/100 clock 91.4286%, time 918.15ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 39˚C, pwr: 215.377W, throttle: 5
9/100 clock 91.4286%, time 1028.6ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 39˚C, pwr: 234.564W, throttle: 5
10/100 clock 91.4286%, time 1139.84ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 40˚C, pwr: 249.057W, throttle: 5

The throttle value of 5 means 0x0000000000000004LL.

nvidia-smi output

$ nvidia-smi
Fri Feb 18 12:31:01 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:4B:00.0 Off |                  Off |
| 54%   79C    P2   274W / 300W |   1137MiB / 48685MiB |     98%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1869      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A    209693      C   ...onv/dnnmark_test_bwd_conv     1129MiB |
+-----------------------------------------------------------------------------+

Moreover, I tried to fix SM clock with nvidia-smi to its max value:

$ sudo nvidia-smi -pm 1
$ sudo nvidia-smi -ac 8001,2100

And regardless of nvidia-smi now reporting that application clocks are on 2100MHz:

$ nvidia-smi -q -d CLOCK
...
    Applications Clocks
        Graphics                          : 2100 MHz
        Memory                            : 8001 MHz
    Default Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 8001 MHz
        Video                             : 1950 MHz
...

I am still not actually getting 2100MHz:

$ nvidia-smi --query-gpu=name,clocks.current.sm,clocks.max.sm --format=csv -l 1
...
NVIDIA RTX A6000, 1920 MHz, 2100 MHz
NVIDIA RTX A6000, 1935 MHz, 2100 MHz
NVIDIA RTX A6000, 1935 MHz, 2100 MHz
NVIDIA RTX A6000, 1920 MHz, 2100 MHz
...

There is no guarantee that you can achieve the maximum frequency on any particular GPU.

What you are observing is perfectly normal and expected.

The GPU clocking is dynamic over a wide range, based on the physical properties of each individual card, current load, and environmental factors. My experience with modern GPUs is that one is unlikely to hit 100% of the maximum boost clock, and if so, only for a few seconds. Things that will help with maintaining clock rates as high as possible:

(1) Check the output of nvidia-smi -q. Is Enforced Power Limit equal to Max Power Limit? If not, use nvidia-smi to raise the enforced power limit to the maximum allowed.

(2) Cool the GPU aggressively. My observation is that sustaining high clocks requires a temperature < 60 deg C. Under full load, with stock coolers, most high-end GPUs heat up to 80+ deg C in a short amount of time.

Note that for many workloads, raising the GPU core clock helps performance less than one might think, because they are ultimately limited by memory throughput rather than computational throughput.

@njuffa
(1) I checked with nvidia-smi, and Enforced Limit = Max Power Limit.

$ nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Fri Feb 18 21:53:35 2022
Driver Version                            : 470.42.01
CUDA Version                              : 11.4

Attached GPUs                             : 1
GPU 00000000:4B:00.0
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 23.77 W
        Power Limit                       : 300.00 W
        Default Power Limit               : 300.00 W
        Enforced Power Limit              : 300.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 300.00 W
    Power Samples
        Duration                          : 11.00 sec
        Number of Samples                 : 119
        Max                               : 106.83 W
        Min                               : 23.34 W
        Avg                               : 70.16 W

(2) The temperature increases gradually as I warm up the GPU, however, as you can see in the logs from my first post, the throttle kicks in at about 36 degrees.

It seems you have reached the limits of what can be accomplished with current semiconductor technology. The last time I saw a GPU sustain 100% of the maximum boost clock for a sizeable stretch of time was with the Pascal generation. In recent years, for both CPUs and GPUs, dynamic clocking has been refined to maximally exploit the thermal and power envelopes of processors.

GPUs have multiple throttling mechanisms to ensure safe operation: (1) thermal throttle (2) power throttle (3) voltage stability throttle. In my experience (3) is hit rarely, and (2) usually engages before (1).

Most of the power drawn by a GPU is due to “AC power”, largely from capacitive loads (moving electrical charge around at high frequencies). Some of the power is “DC power”, due to ohmic resistance in the electrical and electronic components. This latter component has a temperature dependency, in that a hot GPU draws more power than a cold one. This is why aggressively cooling a GPU can help with avoiding the power throttle even if the thermal throttle has not yet engaged.

As far as I am aware, GPU temperatures below 40 deg C under full load are generally only achievable with water cooling (although I haven’t had a chance to see how low air cooling could take a GPU in an unheated hut during the Alaskan, Siberian, or Icelandic winter).

As I said before, the amount of performance left on the table due to reaching only 90% of maximum boost clock is probably small (say, a couple of percent), since many workloads will bottleneck on memory bandwidth first. You may want to create a roofline model of performance for your workload.