Nvidia-smi -gtt doesn't work 535.104.05

Arch Linux Desktop PC
6.4.12-zen1-1-zen
Happens on both open and proprietary versions of 535.104.05
GPU 0: NVIDIA GeForce RTX 3060 Ti (UUID: GPU-ba73bc75-4c91-6012-1365-c8e673737f6b)

Steps to reproduce:

  1. nvidia-smi -gtt 65
  2. run any heavy graphical app

Expected behavior:
GPU starts to throttle at set temperature, temperature doesn’t rise above set value.

Previously setting did work as expected.

Upload on this forum never works for me so here is nvidia-bug-report:
https://github.com/NVIDIA/open-gpu-kernel-modules/files/12440800/nvidia-bug-report.log.gz

Original thread on github:

@ewbteewbte
Thanks for writing to us, I have filed a bug 4260165 internally for tracking purpose.
I will try to replicate issue on my test system first and update on further proceedings.

Setup - Dell Precision T7610 + Genuine Intel(R) CPU @ 2.30GHz + Ubuntu 22.04.1 LTS + kernel 5.19.0-46-generic + NVIDIA GeForce GTX 1650 SUPER + Driver 535.104.05 + Display DELL G3223D
I tried below steps and seeing temp throttles to 66-67 at maximum, can you please confirm if you are seeing similar range or it increases further in your setup.

  1. Run command “nvidia-smi -gtt 65”
  2. Launched 5 instances of Unigine heaven benchmark and GPU temp max throttles to 66-67
  3. Tried above 2 steps couple of times and observed same behavior.
  4. Later I rebooted system and then ran 5 instances of benchmark, temp quickly throttles to 74-75C

It does increase up to 87C, after which I quit benchmark app or game because I don’t want to risk the chance of damaging my GPU.
Normally I use “-gtt 80” and my GPU never surpassed 80C since I bought it a year ago.
Anyway, I provided the log file, shouldn’t it be enough?

@ewbteewbte
Please confirm benchmark which you tried.

unigine superposition, elden ring, dark souls 3

@ewbteewbte
Do you know the last passing driver where issue doesn’t persists.

Hard to tell, last time i used demanding apps was in April or May.
I run all games with 60fps limit, lately I was only playing games like project zomboid which do not utilize gpu much so I couldn’t notice the change in behavior until I tried gpu heavy games again.

@ewbteewbte
Is it possible for you to test with 530 branch driver or even bit older to see if problem exists in earlier branches as well.
I am still not able to repro issue on my couple of test systems.

It is not possible. Could it be 30series specific?

@amrits is it tied to coolbits setting? I always used “12”

@ewbteewbte
I am seeing similar behavior with + Arch Linux + kernel 6.4.12-arch1-1 + NVIDIA GeForce RTX 3080 + Driver 535.104.05 where GPU temperature peaks around 74 after running Unigine Superposition benchmark.
Shall check for the cause and update.

Good! Just in case, with 535.113.01 issues is still present.

545.29.02 issue is still present.

545.29.06 issue is still present.

550.54.14 it does attempt to lower the gpu clocks a little bit but temperature is still able to surpass the value set by -gtt
(for instance from 2100Mhz it drops to about 1900 something)

Hi All,
We have analyzed the issue from our local repro and observed that thermal policy is functioning as per the expectations. However, the workload is too intense to further reduce the temperature. Accordingly, we need to revise the thermal settings.