Nvidia-smi -gtt option since 460.27 causes major performance issues on laptops

shoppy21 · April 29, 2021, 6:22am

The value of -gtt option of nvidia-smi defaults at 75C and it’s not mutable for most laptop GPUs. Most laptop GPUs are designed to be run at higher than 75C for the preset fan profile, causing laptops to be running in their lowest power state most of the time.

I’ve tried all versions of stable driver releases since the first release of 460, all of them have this issue up to the latest 465 release. The last usable driver is 455.45.01, and it’s only usable up to the 5.4 LTS kernel.

generix · April 29, 2021, 8:28am

I don’t really understand the exact issue, can you please explain it a bit in-depth?

shoppy21 · April 29, 2021, 9:54am

This issue can be reproduced easily by running CUDA load in hybrid mode on an optimus laptop, just watch the GPU temperature reaches 75C, and it throttles to the lowest power state.

The new -gtt --gpu-target-temp option since 460 drivers in nvidia-smi controls the temperature which GPU thermal throttle will occur in degree celsius. However, the value is immutable for most laptop GPUs, and while they are immutable, this value seems defaulted at 75C and not ignored by the GPU, reducing the clock and memory frequency to the lowest power state when GPU temperature reaches 75C. Laptop GPUs are hence throttled way too early in their healthy operating temperature of 75C, with fan profiles of laptops that barely spin up the fans at 75C, GPUs affected by this issue can only operate at high performance for a short time period by manually setting to max fan speed, otherwise they spend most of time in a loop of scaling up to higher power state, and then throttled until cooled down to 60C.

I’ve tested 2 laptops, one is Acer Triton 300 with 2070 max-q, another is Asus Zephyrus S with 2080 max-q, both exhibits this “feature”. According to GreenWithEnvy readings, 2070 of Triton 300 has critical temperature of 87C, slow down at 93C and shut down at 98C defined in VBIOS, yet it throttles to lowest power state at 75C with 460 or later drivers. 2080 of Zephyrus S does similarly.

I’ve bisected driver versions which the last good version that throttles correctly according to VBIOS definitions is 455.45.01, any 460 and later drivers throttles way too early at 75C. According to the changelogs, the first release of 460 driver introduces -gtt --gpu-target-temp controllable by nvidia-smi, and this issues begins from there, hence I’m quite sure that this new feature is related to the described incorrect throttling behavior.

generix · April 29, 2021, 2:18pm

Ok, got it. So the nvidia driver sets the temperature target on (notebook) gpus which don’t support setting a different target through nvidia-smi.
Just to make sure, you ran nvidia-smi -gtt as root?
Does nvidia-smi -q at least report the target or just N/A?

shoppy21 · April 29, 2021, 5:33pm

I just ran those commands on 465.24.02, output as below:

$ sudo nvidia-smi -gtt 90
GPU Target Temperature Threshold not supported for GPU 00000000:01:00.0.
Treating as warning and moving on.
All done.

$ nvidia-smi -q | grep Temp
Temperature
GPU Current Temp : 55 C
GPU Shutdown Temp : 98 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 87 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A

generix · April 30, 2021, 7:30am

So the effect is only really noticeable indirectly, from current temp never exceeding 75°C while the clocks are at minimum in that case?
Please create a nvidia-bug-report.log.gz from that stuation (gpu at 100%) and send it to linux-bugs[at]nvidia.com, maybe it will create some attention to this.

Topic		Replies	Views
Nvidia-smi GPU target temperature / Maximum Operating Temperature Drivers - Linux, Windows, MacOS	3	8347	April 11, 2024
Nvidia-smi -gtt doesn't work 535.104.05 Linux	16	1567	March 4, 2024
Changing power management limit is not supported for GPU ( #Pascal #GTX1060 #laptop #mobile #Linux #555 #nvidia-smi #powerlimit ) Linux nvidia-smi	2	1350	July 5, 2024
Nvidia-smi GPU T.Limit/ GPU Shutdown T.Limit Temp Drivers - Linux, Windows, MacOS	0	571	May 7, 2024
RTX 3060 Laptop, Stuck at 45Watt Linux	4	536	February 27, 2024
GPU Max Operating Temp seems low for Laptop 3060 Drivers - Linux, Windows, MacOS	3	10156	August 10, 2023
Severe throttling on Thinkpad T14 Gen 1 with GeForce MX330 Linux linux , gpu	11	5225	December 25, 2022
GPU throttling? Video Processing & Optical Flow	1	740	November 18, 2019
GPU won't boost as high as previously [Driver 460.32.03] Linux	5	545	January 12, 2021
`nvidia-smi -q` shows several "Unknown Error"; GPU ignored by pytorch Linux cuda , ubuntu	3	1908	September 6, 2023

Nvidia-smi -gtt option since 460.27 causes major performance issues on laptops

Related topics