Arch Linux version 367.27 GT 650M overheats (stays at 950Mhz) on battery

Thanks for this info.

For now we are interested in pstates and temperature. So please run below command in both cases.

nvidia-smi -i 0 --query-gpu=timestamp,pci.bus_id,temperature.gpu,pstate --format=csv -l 1 -f <file_name>

We need these logs in both cases. Make sure to run the game to reproduce the issue while taking logs. I mean game should be running when taking log.

  1. When issue is not seen on R364_00 driver and
  2. When issue is see on R367_00 driver.

Uploaded results of 364.19 driver, which demonstrates expected behaviour in PAYDAY 2 [went to the menu, then loaded a game and exited].

Will test with 375.10 ASAP, when it comes out of the arch testing repos.

375.10 is here : ftp://download.nvidia.com/XFree86/Linux-x86_64/375.10/

Can I get log file generated with below command :

nvidia-smi -i 0 --query-gpu=timestamp,pci.bus_id,temperature.gpu,pstate --format=csv -l 1 -f <file_name>

Sorry for being late, i had quite a busy week.

Nvidia-375.10 BETA still exhibits the same issues, updated post with log file with the command you requested.

I dont know how to debug nvidia-smi any further to make it report the remaining data,

however i know that nvidia-settings CAN capture those data. I also updated the post with a file with the output of “optirun -b none nvidia-settings -c :8 -q all”.

In addition, i discovered this way that the reported powermizer modes in the gui are wrong. i can see 3 instead of 2 in the nvidia-settings logfile. (correct me if i’m wrong here)

Should i proceed with making a bash script to record all with nvidia-settings instead?

Hi oanonymos0, Can you test with 375.20 to check this issue resolved ?

Hello,

unfortunately 375.20 still gets to 950Mhz on battery, so the issue is not solved yet.

However i noticed that “Total Dedicated Memory” in nvidia-settings went from (2047mb 364.19) (1997mb 367.27+) to (2028mb 375.20).

Does your GTX 660M have identical behaviour between 364.19 and 375.20?

Please tell me if i can do something more to help you solve this.

Hi oanonymos0,
We have investigated this issue found improvement in clock is expected behavior with latest driver. This is not a bug, the behavior you is seeing w.r.t. clocks values in battery mode are indeed expected (R367_00 introduced an expected behavior).

We may try to focus on “overheating” issue. Could you please share what different did you find in GPU temperature with earlier and latest driver? Please provide reproduction steps for overheating issue. Is it only happen when you play the games? How long need to play game. What is min and max temperature reading?

Hello sandipt,

so this means that the behaviour is intended? I am really negatively affected by this change in behaviour. I propose two possible solutions:

  1. Make an option to disable turbo boost globally (and in my case cap the maximum clocks to 835Mhz) in nvidia settings. I believe this is the best (and AFAIK the easiest) solution that would solve this issue completely. Many laptop users would benefit from that.

  2. The issue with turbo boost is that the GPU keeps its 950Mhz, even when reaching high temperatures of 80-90C, which makes the CPU throttle and severely degrades performance, if not cause the laptop to shutdown. [i’ve experienced that] The solution to this would be to disable this turbo boost after some temperature threshold (like 75-80C?). The performance drop would be minimal and AFAIK this is the behaviour seen on windows too. It will benefit users with not-so-good cooling systems a lot as well.

Unfortunately i dont think you could do something to reduce overheating, other than reducing clock values.
My laptop likes to heat a lot in general and i can only reduce heating by lowering clock values both on CPU and GPU, especially in the summer. I also tried cleaning the dust and putting new thermal paste but it didnt help much. Anyways, I will provide what you requested ASAP.

Please set coolbits with command #nvidia-xconfig --cool-bits=8 or directly edit /etc/X11/xorg.conf file and below in “Screen” section

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
Option “Coolbits” “8”
SubSection “Display”
Depth 24
EndSubSection
EndSection

I think this will help you to get around his higher clocks issue. Need to restart the X server and launch nvidia-settings application. In nvidia-settings, you can set the gpu clock offset to -ve value so that it won’t reach the max clock which causes heating issue.

Also see this for reference : How To Overclock New NVIDIA GPUs On Linux - Phoronix

Thank you for the instructions, sandipt.

This should most probably solve my problem, however i have one concern.

According to:
http://us.download.nvidia.com/XFree86/Linux-x86/375.20/README/xconfigoptions.html

WARNING: this may cause system damage and void warranties. This utility can run your computer system

 out of the manufacturer's design specifications, including, but not limited to: higher system voltages, 

above normal temperatures, excessive frequencies, and[b] [u]changes to BIOS that may corrupt

 the BIOS[/u][/b]. Your computer's operating system may hang and result in [b][u]data loss or corrupted 

images[/u][/b]. Depending on the manufacturer of your computer system, the computer system, hardware and

 software warranties may be voided, and you may not receive any further manufacturer support. NVIDIA 

does not provide customer service support for the Coolbits option. It is for these reasons that 

absolutely no warranty or guarantee is either express or implied. Before enabling and using, you 

should determine the suitability of the utility for your intended use, and you shall assume all 

responsibility in connection therewith.

I want to clarify, the nature of the risks. Does the warning refer to incorrect usage of coolbits? Does it only refer to old laptops/desktops?

Or is there still a risk that my computer will get bricked while following your instructions? If so, how likely is it?

If it is the latter, i believe adding an option to disable turbo boost in nvidia-settings and/or xorg.conf without coolbits would be the ultimate solution and would help others as well who want to keep their computer cool, without the risk of coolbits.