Cooling on two Titan X (Pascal) and fan speed

kharenski · November 8, 2016, 6:07pm

I have two Titan X (Pascal) used for machine learning, probabilistic fiber tracking and other processes that will run for days or weeks at a time. Every time I check nvidia-smi on that system I notice GPU1 is much hotter than GUP0.

Given the equal utilization I don’t really understand why this would be. We have another system running GTX 980s, that when running the same program behaves as follows:

Note the similar temps across the currently used GPUs. Is there something I should be concerned about with respect to the Titan X above running that much hotter than the other card?

Thank you,

Keith

njuffa · November 8, 2016, 6:16pm

I don’t think this is necessarily something to worry about. Quick experiment: Physically swap the two GPUs involved. Does the “hot spot” follow the GPU, or does it stay with the slot position?

[follows the GPU]
hypothesis (1): different VBIOS versions with different fan profiles
hypothesis (2): different power dissipation due to
— (2a) different vendor, or different vendor SKUs
— (2b) normal manufacturing tolerances in the components including the GPU itself

[stays with the slot]
hypothesis (1) Unequally distributed workload (note “GPU-Util” in the log you showed) <-----
hypothesis (2) Insufficient or turbulent airflow around the hotter card.

[Later:] Ooops. I notice belatedly that the GPU with the lower utilization (0%) shows very high power usage (115W). That makes no sense. The sensor output shown by nvidia-smi is not instantaneous so maybe this is an artifact of the sensor query process, where the 115W of power consumption refers to a lightly earlier time frame than GPU utilization, which now has fallen to 0% since the workload finished? Try monitoring continuously to see whether the numbers make more sense.

Topic		Replies	Views
Cooling multiple Titan X: fan speed? CUDA Programming and Performance	7	4703	April 19, 2017
Titan X GPU throttling due to capped fan speed? CUDA Programming and Performance	1	1153	April 19, 2017
The GPU FAN runs heavily after the process is done. CUDA Setup and Installation	19	4806	July 20, 2017
Get GPU Usage CUDA Programming and Performance	1	14920	February 3, 2013
GTX TITAN X always on power state P0 Linux	1	821	July 27, 2019
One weird trick to get a Maxwell v2 GPU to reach its max memory clock ! CUDA Programming and Performance	59	17908	April 22, 2016
1080GTX Ti GPU clock and power drawn is throttled all of a sudden! CUDA Programming and Performance	12	1726	July 31, 2018
Controling fan speed of Titan and TitanX with TCC enabled CUDA Programming and Performance	15	5235	December 5, 2022
GPUs are stuck when using multiple GPUs to train CUDA Programming and Performance	4	1917	November 13, 2020
TitanX slower than CPU (Tensorflow), possible configuration issue CUDA Programming and Performance	9	4510	April 13, 2016

Cooling on two Titan X (Pascal) and fan speed

Related topics