SM clock frequency on my RTX A6000 card cannot reach its maximum value.
I have a benchmarking program in C++ and CUDA. Before benchmarking I need to warm up the GPU, performing CUDA calculations until SM frequency goes up to 100% of the max value. To read GPU frequencies I use NVML library, and also manually check parameters with nvidia-smi.
The SM frequency on my RTX A6000 never achieves 100% of the Max SM clock frequency, it only goes up to about 92%. The reason seems to be in throttling: NVML reports the throttle value of 0x0000000000000004LL, which is described as SW Power Scaling algorithm is reducing the clocks below requested clocks
here.
nvidia-smi (and NVML) report that the max SM frequency is 2100MHz.
$ nvidia-smi -q -d CLOCK
==============NVSMI LOG==============
Timestamp : Fri Jan 28 08:58:51 2022
Driver Version : 470.42.01
CUDA Version : 11.4
Attached GPUs : 1
GPU 00000000:4B:00.0
Clocks
Graphics : 0 MHz
SM : 0 MHz
Memory : 405 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1800 MHz
Memory : 8001 MHz
Default Applications Clocks
Graphics : 1800 MHz
Memory : 8001 MHz
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 8001 MHz
Video : 1950 MHz
...
However, the max frequency that I can achieve by warming up the GPU with computations is only 1935MHz. Then the throttle kicks in and the frequency goes down.
Here is a fragment of my warming up program output:
I0128 09:07:20.779960 22 warmup.cu:161] Before: P2, smclock 85.7143%, 32˚C CLOCKS (graph,sm,mem,vid): 1800,1800,7600,1590
I0128 09:07:20.781229 22 warmup.cu:224] GPU NVIDIA RTX A6000, 84 SMs, 1536 Max threads per SM, 1024 max threads per block
I0128 09:07:20.781234 22 warmup.cu:233] Warmup parameters: N=258048 elements, 2 array elements per thread, 252 blocks x 1024 threads per block, elements/thread:2
1/100 clock 85.7143%, time 117.585ms CLOCKS (graph,sm,mem,vid): 1800,1800,7600,1590, temp: 35˚C, pwr: 73.68W, throttle: 0
2/100 clock 92.1429%, time 253.757ms CLOCKS (graph,sm,mem,vid): 1935,1935,7600,1695, temp: 36˚C, pwr: 99.392W, throttle: 5
3/100 clock 91.4286%, time 364.268ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 38˚C, pwr: 118.835W, throttle: 5
4/100 clock 92.8571%, time 473.866ms CLOCKS (graph,sm,mem,vid): 1950,1950,7600,1710, temp: 38˚C, pwr: 138.128W, throttle: 0
5/100 clock 91.4286%, time 585.882ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 39˚C, pwr: 157.558W, throttle: 5
6/100 clock 91.4286%, time 696.576ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 39˚C, pwr: 176.998W, throttle: 5
7/100 clock 92.1429%, time 807.031ms CLOCKS (graph,sm,mem,vid): 1935,1935,7600,1695, temp: 39˚C, pwr: 196.352W, throttle: 5
8/100 clock 91.4286%, time 918.15ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 39˚C, pwr: 215.377W, throttle: 5
9/100 clock 91.4286%, time 1028.6ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 39˚C, pwr: 234.564W, throttle: 5
10/100 clock 91.4286%, time 1139.84ms CLOCKS (graph,sm,mem,vid): 1920,1920,7600,1680, temp: 40˚C, pwr: 249.057W, throttle: 5
The throttle value of 5 means 0x0000000000000004LL
.
nvidia-smi output
$ nvidia-smi
Fri Feb 18 12:31:01 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A6000 On | 00000000:4B:00.0 Off | Off |
| 54% 79C P2 274W / 300W | 1137MiB / 48685MiB | 98% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1869 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 209693 C ...onv/dnnmark_test_bwd_conv 1129MiB |
+-----------------------------------------------------------------------------+
Moreover, I tried to fix SM clock with nvidia-smi to its max value:
$ sudo nvidia-smi -pm 1
$ sudo nvidia-smi -ac 8001,2100
And regardless of nvidia-smi now reporting that application clocks are on 2100MHz:
$ nvidia-smi -q -d CLOCK
...
Applications Clocks
Graphics : 2100 MHz
Memory : 8001 MHz
Default Applications Clocks
Graphics : 1800 MHz
Memory : 8001 MHz
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 8001 MHz
Video : 1950 MHz
...
I am still not actually getting 2100MHz:
$ nvidia-smi --query-gpu=name,clocks.current.sm,clocks.max.sm --format=csv -l 1
...
NVIDIA RTX A6000, 1920 MHz, 2100 MHz
NVIDIA RTX A6000, 1935 MHz, 2100 MHz
NVIDIA RTX A6000, 1935 MHz, 2100 MHz
NVIDIA RTX A6000, 1920 MHz, 2100 MHz
...