Overclock GPU at lower voltages

I use Ubuntu 18.04.
I would like to overclock my NVIDIA GTX 1050 Ti Max-Q GPU, but I want to do that only at lower voltages as shown in this video:

and as explained in this reddit post:


Specifically I would like to know:

  1. How can I modify the frequency-voltage curve on Linux?
  2. In PowerMizer (nvidia-settings) what does the GPU offset do? Does it move the entire curve by that offset?
  3. What is a good offset for the mem clock? What are the risks and and can the mem clock offset be stree-tested?

I have asked this question also on Unix & Linux StackExchange:

I have tried for a few days to find solutions online. Usuccessfully.

You don’t.
Let Nvidia handle the voltage.
You could change the power limit, which in turn will adjust the voltage curve.
The problem with voltage is that if you even have a 0.01V too low setting for a split second, your GPU driver could crash.
Setting the power limit, and overclocking, will allow Nvidia driver to regulate the voltage by itself.
you first have to run the command:
sudo nvidia-xconfig --cool-bits=24
then reboot,

The command for power-capping (and thus also reducing voltage) in linux is:
sudo nvidia-smi -pl 130
Where ‘130’ is the wattage. Most RTX GPUs run anywhere from 125W to 300W.
If you set this value too high, or too low, the driver will tell you in terminal, what the limits of that GPU are.

You then need access to a GUI desktop, to access NVIDIA X server.

If you have more than one GPU, you’d have to do the command:
sudo nvidia-xconfig --enable-all-gpus
before you do:
sudo nvidia-xconfig --cool-bits=X

This ‘–enable-all-gpus’ command, may break your desktop experience, as many Linux versions need special installation commands (only working on 18.04 or before), to run multi-gpus.

For a multi GPU system, you’ll need to uninstall the .deb installation files, and purge them, but first download the new .run file drivers.
Save them in a directory (eg: /MYUSERNAME/home/Downloads/)
Make them executable, by going to the Downloads folder and do:
chmod +x NVidiaDriverVersion.run
Where you type chmod +x NV and then the tab button, to autofill the correct version name.
Then purge the drivers: sudo apt purge Nvidia*
Also make sure you have GCC and MAKE installed: sudo apt install gcc make

Then restart, get into grub, perhaps wise to enable networking from the menu, if you have the option, if not go straight into root shell.
Once there, go to the Downloads folder, and run the drivers: sudo ./NVidiaDriverversion.run
Then apply the ‘enable-all-gpus’, and ‘coolbits’ option, and reboot again.

You can power cap in a terminal window, and overclock/adjust fan curve in the gui in these menus:
image

1 Like

Thank you, @prodigit80!

Let Nvidia handle the voltage.

So, would you say that it is an oversight on the part of NVIDIA that Windows users can modify the frequency-voltage curve with MSI Afterburner (as shown in the first link above)?

An example of frequency-voltage curve would be:

My understanding is the GPU works according to such a curve.

When the GPU performs a minor task it can run at a lower frequency with less voltages, when it has to do something more computationally intensive - it will run at higher frequency requiring more voltages.

If it is possible to modify this curve on Windows, why shouldn’t it be possible to modify it on Linux?


The problem with voltage is that if you even have a 0.01V too low setting for a split second, your GPU driver could crash.

Those making changes to the NVIDIA settings, be it overclocking or changing the power limit, are advised to stress-test. On Linux, this can be done with a range of tools such as Unigine Valley/Heaven/Superposition, or games. Here I have a few questions:

  • What amount of stress-testing is necessary to make sure that the GPU is operating errorlessly? Can all the crashes be detected with the human eye, or, to put it differently: are there anomalies that may go undetected?
  • I use the NVIDIA card mostly for CUDA computations. Should I be concerned that there might be some faulty results if I do not stress-test thoroughly, i.e. might it happen that I will not get any errors/warnings but get wrong results? Or should I expect that if something is wrong with my changes the GPU will simply stop working and the operations will not be carried through?
  • Are power/GPU clock changes hardware safe? can they damage the GPU?

Setting the power limit, and overclocking, will allow Nvidia driver to regulate the voltage by itself.

  • If it self-regulates, why does the GPU still crash when I overclock (e.g. by 400 MHz)? Are the crashes the result of power/overclock changes per se or because of the new voltage levels chosen during self-regulation?
  • As it regulates itself, does it mean that a positive GPU clock offset will result in more voltage being drawn?
  • Similarly, does a decrease in power limit result in less voltage being drawn?

As far as I know, power is computed as follows: P (W) = I (A) × V (V)
So, by reducing the max power limit, does it mean that at each frequency the GPU runs on less power? And if so, is it the voltage, V, that is being reduced? Would this be a good way to undervolt?

Or does this mean that the GPU will run only up to the frequency/voltage level corresponding to that max power.
By taking again the picture above as our example, if the GPU consumes 120 W to run at 1900 Mhz (1010mV), and we reduce it to 115 W, the max voltage (corresponding to the max power setting) at which the GPU will run will be V = 115 / (120/1.03) = 987mV corresponding to ~ 1870 Mhz. So, if we reduce the power to 115 W the max attainable frequency would be 1870 Mhz, right?

  • What does the GPU clock offset actually do? Does it move the entire curve by that offset?

Assume there is a stock max frequency (i.e. the hard-coded max frequency at which the GPU can run) level of 1900 Mhz and the min frequency on the curve is 1350 Mh (just as in the first picture). If I successfully set an overclock offset of 550 Mhz, am I right to think that my GPU will always operate at 1900 Mhz consuming 700 mV?

Of course, the benefits of overclocking gradually towards lower voltages allows for a better stress-testing.


sudo nvidia-xconfig --cool-bits=24

  • Did you mean to write 28?

If you set this value too high, or too low, the driver will tell you in terminal, what the limits of that GPU are.

  • Are there any risks associated with setting the power at the min/max capacity?

If you have more than one GPU, you’d have to do the command:
sudo nvidia-xconfig --enable-all-gpus
before you do:
sudo nvidia-xconfig --cool-bits=X

This ‘–enable-all-gpus’ command, may break your desktop experience, as many Linux versions need special installation commands (only working on 18.04 or before), to run multi-gpus.

I have an Intel iGPU and a Nvidia GTX 1050 Ti (Max-Q) dGPU on my laptop.

I do not experience desktop troubles. What I experience is that I cannot reboot after I run sudo nvidia-xconfig. I have posted about this here: nvidia-xconfig breaks boot.

Among the identified/suggested solutions are:

Solution 1:
Run sudo nvidia-xconfig to create the xorg.conf file.
Open the file and navigate the the Section “Device” and :

  1. Comment out Driver "nvidia"
  2. Add Option "Coolbits" "28" (for some reason running sudo nvidia-xconfig --cool-bits=28 adds Option "Coolbits" "28" to Section “Screen” and not to “Device”)

Solution 2:
Suggested here. Create a myxorg.conf file in xorg.conf.d:

Section "Device"
      Identifier  "Nvidia Prime"
      Option      "Coolbits" "28"
EndSection

Form the author:

I believe setting the option that way applies it on top of the autogenerated Xorg configuration, so there should be no conflict with what Ubuntu does behind the scenes.

This solution has been criticized in the comments:

The problem with that myxorg.conf file you tried is that you use a “Device” section. That’s forcing a certain setup. If there’s several Device sections, they clash with each other and things break.

Solution 3:
Use a xorg.conf file:

Section "ServerLayout"
	Identifier "layout"
	Screen 0   "nvidia"
	Inactive   "intel"
EndSection

Section "Device"
	Identifier "nvidia"
	Driver     "nvidia"
	BusID      "PCI:1:0:0"
	Option     "Coolbits" "12"
EndSection

Section "Screen"
	Identifier "nvidia"
	Device     "nvidia"
	Option     "AllowEmptyInitialConfiguration"
EndSection

Section "Device"
	Identifier "intel"
	Driver     "modesetting"
EndSection

Section "Screen"
	Identifier "intel"
	Device     "intel"
EndSection

Solution 4:
Use a myxorg.conf file in /etc/X11/xorg.conf.d/ with the following content:

Section "OutputClass"
    Identifier "my nvidia settings"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "Coolbits" "12"
EndSection
  • Are the steps you suggest (download .run file drivers, uninstall .deb files, reinstall) aimed at preventing the issue I have mentioned above from happening?

  • Would you recommend any solution from those enunciated above? If not, what is wrong with them?

Since on Ubuntu 18.04, I cannot use both GPUs at the same time, i.e. I have to prime-select one, why do I bump into this issue?

download the new .run file drivers

I’m just here to say that 400 MHz is a huge overclock to even think about putting on a GPU core. Memory? Yes. Core? Try +150.