TITAN V max clock speed locked to 1,335 Mhz and underperforms TITAN Xp (Ubuntu 16.04, nvidia 390 & 396)

tkdrlf9202 · June 27, 2018, 11:37am

Hi, I have a question regarding to a clock speed behavior of TITAN V in ubuntu 16.04.
We’ve recently built an 8x TITAN V GPU server for running machine learning frameworks (TensorFlow, PyTorch, Keras) from supermicro and installed ubuntu 16.04, and nvidia-390 (390.64) from apt repository. CUDA 9.0 is installed (.deb installation method) with all 3 performance patches included.

The TITAN V oddly underperforms the TITAN Xp (desktop devbox, same OS & driver setup). The reason was that the TITAN V clock speed is locked to 1,335 Mhz and does not go up any higher.
We expected that the clock maxes out at 1,912 Mhz (official max clock) and progressively goes down as the temperature increases, but it was not the case.

However, our TITAN Xp devbox hits the official max 1,911 Mhz and stays around 1,800 Mhz with ~80c temp, so the CUDA codes we run (machine learning models) perform about 30% faster in the TITAN Xp system.

Is the behavior of TITAN V by design? The thermal issue is not the case here because the supermicro chassis keeps all 8 GPUs under 70c, and the clock is still 1,335 Mhz at ~40c range. We’ve also tested with gpu_burn http://wili.cc/blog/gpu-burn.html with the same issue.

Switching the driver from nvidia-390 (390.64) to nvidia-396 (396.24.02) did not changed the behavior. Manually setting the clock by

sudo nvidia-smi -ac 850, 1912

applies, but the actual clock is still capped at 1,335 Mhz.
The power usage can only hit 60~70 % of TDP (250W), while the TITAN Xp frequently hits over 250W.

The difference of the two systems is that the TITAN V server display is connected to the VGA port of supermicro motherboard, and the TITAN Xp devbox display is connected to the HDMI port of the GPU itself (installed in ASUS X99-E WS motherboard)

Would it be a driver-related problem or the expected boost clock policy? Any guidance would be appreciated. I suspect that CUDA version issue is less likely because 2 independent tests (TensorFlow code using System-installed CUDA 9.0, PyTorch code using anaconda environment with the bundled CUDA 9.1) got the same results.

Attatched nvidia bug report log of the TITAN V system and 2 pictures from the two systems with a command

nvidia-smi -q

I also find that this issue is relevant to https://devtalk.nvidia.com/default/topic/1028063/?comment=5229233.

nvidia-bug-report.log.gz (375 KB)

generix · June 27, 2018, 12:11pm

Please put the Titans under load and rerun nvidia-bug-report.sh

tkdrlf9202 · June 27, 2018, 12:30pm

Attached the new nvidia-bug-report while running gpu_burn (and also screenshot)
nvidia-bug-report.log.gz (424 KB)

generix · June 27, 2018, 2:06pm

Ok, the Titan V stays in performance state P2.
Starting with Pascal, nvidia enforced a driver policy for consumer cards to reach only P2 on plain cuda workloads, the maximum P0 only on graphics workloads.
The Titans being “prosumer” cards it’s a bit puzzling why this policy now is also enforced on the Volta but not the Pascal. So technically everything is allright and only nvidia staff could give answers about policy (NDA required, possibly).

tkdrlf9202 · June 27, 2018, 2:51pm

Yes, I’m also aware that the Geforce & TITAN cards (from Pascal) only allows up to P2 state (our P100 & V100 SXM2 (NV-Link) machines can reach P0). What is more puzzling to me is that the 3x TITAN Xp devbox machine (with X99 motherboard and i7-6850K CPU) is capable of maxing out the maximum boost clock speed on CUDA ops (~1900 Mhz, Just like running graphics ops (like 3D games) in Windows, for example), resulting in a real-world ML models actually running faster in TITAN Xp than TITAN V. This is a normal FP32 CUDA ops though. For what is worth, we haven’t tested a FP16 performance (using TITAN V’s Tensor Core) yet.

I’ve additionally attatched the nvidia-bug-report from the TITAN Xp devbox while running real-world TF & PyTorch python codes (GPU:0 runs around 1400Mhz due to a thermal limit. Others (GPU:1 & :2) runs at near max clock speed due to a lighter workload. All P2 perf state.)

If this is indeed the turbo boost 3.0 clock policy of CUDA ops of Pascal vs. Volta, What would be the point of TITAN V if the FP32 CUDA perf is this crippled?

Apparently a number of TITAN V gaming benchmarks (on Windows) show no problem running graphics workload and achieves near max turbo boost clock speed at lower temp. Might want to test FP32 CUDA performance on Windows myself but I currently do not have spare parts for this.
nvidia-bug-report.log.gz (3.03 MB)

tkdrlf9202 · June 28, 2018, 2:10am

So I did some more testing by comparing the TITAN V setup with our V100 system. My conclusion is that this CUDA clock speed limit policy of TITAN V is intentional.

The V100 max clock speed (in P0 state) is MEM: 877Mhz, Core: 1,530Mhz. If NVIDIA allows TITAN V CUDA clock speed to reach 1,912 Mhz, It will cannibalize the V100 sale since TITAN V would run much faster.

The question remains why then the clock limit policy of TITAN Xp has been removed. Maybe related to something like [url]https://www.techpowerup.com/235701/nvidia-unlocks-certain-professional-features-for-titan-xp-through-driver-update[/url] when the competitor rolled out the new card. So for now it’s best to stick to TITAN Xp for FP32 CUDA performance.

generix · June 28, 2018, 9:10am

I completely agree, especially the Xp reaching P0 clocks in P2 state is a crazy clocking policy.

Larry-SB · September 24, 2018, 10:15pm

Just ran into this issue myself. Swapped a couple of GTX1080ti for Titan-V cards and was surprised to see them clocked so slowly.

The GTX1080ti will run at full clocks, only slowing when thermal or power limits are reached, as expected.

The Titan-V, caps at 1335 and often will run only to 1200. As soon as TensorFlow or similar compute application connects to the card, the clocks throttle down in Linux.

I booted the machine over into Windows10 and most of the graphics benchmarks worked as expected, with boost clocks and all enabled.

As a result, there is essentially no benefit at all to Titan-V vs GTX1080ti other than as a development platform for fp16 code, that you intend to move to a production machine with Tesla V100 cards.

tkdrlf9202 · October 8, 2018, 7:41am

An official response from NVIDIA moderator for future reference:

[url]https://devtalk.nvidia.com/default/topic/1042047/container-tensorflow/titan-v-slower-than-1080ti-tensorflow-18-08-py3-and-396-54-drivers/post/5288200/#5288200[/url]

Hoping to see that NVIDIA would unlock the clock speed for TITAN V like the Xp.

Another side note for anyone interested, we did see the 2x speed boost when training standard architectures like ResNet by using mixed precision training with apex [url]https://github.com/NVIDIA/apex[/url] for PyTorch. But if a model contains techniques like dilated convolutions, the mixed precision training turned out to be same or somewhat slower. So a hit or miss experience as of now, and hoping for some more optimized codepath from future updates.

tkdrlf9202 · January 19, 2019, 3:04pm

Update: Starting from 415.25 (Linux), TITAN V can reach P0 and 1800~1900Mhz CUDA clock speed. [url]https://devtalk.nvidia.com/default/topic/1042047/container-tensorflow/titan-v-slower-than-1080ti-tensorflow-18-08-py3-and-396-54-drivers/post/5305096/#5305096[/url]

Topic		Replies	Views
Titan V slower than 1080ti tensorflow:18.08-py3 and 396.54 drivers Frameworks (archived) tensorflow	21	10431	October 12, 2021
TitanX slower than CPU (Tensorflow), possible configuration issue CUDA Programming and Performance	9	4527	April 13, 2016
Titan V training time longer than Titan Xp CUDA Programming and Performance	4	955	January 19, 2019
Titan V Memory Clock CUDA Programming and Performance	2	936	August 27, 2018
GTX Titan speed and Boost 2.0 under Linux? Linux	12	11120	March 12, 2013
Titan V boost-clock issue CUDA Programming and Performance	5	1811	December 28, 2017
One weird trick to get a Maxwell v2 GPU to reach its max memory clock ! CUDA Programming and Performance	59	18092	April 22, 2016
Limited clock for the new RTX3090Ti + Ubuntu 20.04 CUDA Programming and Performance	15	3181	December 5, 2022
Titan V drivers unlocked for Professional Workloads as the Titan Xp GPU - Hardware	2	754	November 26, 2018
Titan V tensorflow performance CUDA Programming and Performance	18	9011	January 23, 2018

TITAN V max clock speed locked to 1,335 Mhz and underperforms TITAN Xp (Ubuntu 16.04, nvidia 390 & 396)

Related topics