If you have GPU clock boost problems, please try __GL_ExperimentalPerfStrategy=1

Fedora 30 stock kernel, stock everything: ~14 seconds (which is still too long).

Fedora 30 custom kernel (PREEMPT enabled) + Option “UseNvKmsCompositionPipeline” “Off”: ~36 seconds:

$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done 
__GL_ExperimentalPerfStrategy=1
1
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    44     -     0     0     0     0  4006   936
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    45     -     2     1     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    45     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    28    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    28    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  3802   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  3802   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    11    43     -     0     1     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     0     2     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     0     2     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     1     3     0     0   405   240
$ cat nvidia.conf

Section "Device"
        Identifier      "Videocard0"
        BusID           "PCI:1:0:0"
        Driver          "nvidia"
        VendorName      "NVIDIA"
        BoardName       "NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)"
        Option          "Coolbits" "28"
        Option          "metamodes" "nvidia-auto-select +0+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}"
        Option          "UseNvKmsCompositionPipeline" "Off"
        Option          "TripleBuffer" "On"
EndSection

GTX 1060 6GB here.
config-5.1.7z (21.6 KB)

Hi Birdie,

Thanks for experiments, we were expecting around 13-15 secs to ramp down after an update in driver as of now which was earlier ~40 secs.
Will appreciate if you can confirm that you have tested with custom kernel after booting up system immediately (without any applications running in background).

It looks like the changelog for drivers 430.14 doesn’t contain all the info and this exact driver version contains the fix. I can confirm that it takes approximately 14 seconds to ramp down the clocks in drivers 430.26. Hooray!

By any chance, is it possible to further speed up clocks ramp down under Linux? Say make transitions take five seconds or less? It looks like ramp up takes less than a second, while ramp down is way too slow.

Hi Birdie,

Thanks again for your valuable experiments.
Currently, we are able to reduce gpu clocks from ~38 secs to ~14 secs which is a good sign and will continue to investigate for further improvements.

Thank you! Looking forward to fast clock transitions like it works in Windows.

Hello

After installing this driver on CentOS 7.6 I can no longer reach P0 either with persistence-mode running or not. Any way to fix?

OK on 430.26 but still at P2

Mon Jul 1 11:34:51 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 On | 00000000:02:00.0 Off | N/A |
| 0% 53C P2 127W / 180W | 7000MiB / 8119MiB | 57% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 1080 On | 00000000:03:00.0 Off | N/A |
| 25% 50C P2 70W / 180W | 7002MiB / 8119MiB | 61% Default |
±------------------------------±---------------------±---------------------+
| 2 GeForce GTX 1080 On | 00000000:81:00.0 Off | N/A |
| 25% 56C P2 83W / 180W | 7002MiB / 8119MiB | 51% Default |
±------------------------------±---------------------±---------------------+
| 3 GeForce GTX 1080 On | 00000000:82:00.0 Off | N/A |
| 24% 47C P2 69W / 180W | 7002MiB / 8119MiB | 54% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 44293 C …ph4dozbmictudhml5/bin/relion_refine_mpi 6989MiB |
| 1 44294 C …ph4dozbmictudhml5/bin/relion_refine_mpi 6991MiB |
| 2 44295 C …ph4dozbmictudhml5/bin/relion_refine_mpi 6991MiB |
| 3 44296 C …ph4dozbmictudhml5/bin/relion_refine_mpi 6991MiB |
±----------------------------------------------------------------------------+

Not directly related to slow rampdown of the clocks; if you have issue where your GPU stays on max power state forever, I found a workaround to fix the issue until you either reboot, restart X server or touch your monitor settings.

My Specs:

  • Ubuntu 19.04
  • GTX 1080Ti
  • two 4K monitors

To temporary fix the power issues:

  • Go to Settings (Power + Cog icon on the top right)
  • Settings / Devices / Displays
  • Change resolution of the primary/left monitor to something like 1028x768
  • Apply + Keep Changes
  • Change resolution back to 4K
  • Apply + Keep Changes

After doing this the power states work as they should be until I restart X server or touch the display settings again. It will work even after suspending the computer.

However this does not work if I change the resolution by using xrandr, do not choose to keep the changes after applying the lower resolution or if I lower the resolution from the secondary monitor instead of the primary one. Turning the second monitor off will fix the power issues, but if you turn the monitor back on, the issues are back (until I do the above again). This is also true if I already had power saving working.

It almost looks like that the graphics driver has some hardcoded resolution limit which forces the GPU to use max clocks forever, but because of some glitch it forgets to set that flag when you resize the screen resolution by using the above steps. The flag seems to be maintained over suspend, so it gets stored to the disk during the suspend.

I hope this helps to allow others to find a temporary workaround for this issue – and the nVidia staff to track down the cause of this issue.

It goes down faster but it doesn’t fix the fact that it does spike for no reason… The frequency goes up and the fans start while completely idling on the desktop for no apparent reason. The CPU and RAM aren’t moving an inch when this happens, so it doesn’t seem to be a system issue…
Doesn’t happen on Windows either, so, doesn’t seems to hardware related.

Update : I tried a few things, and disabling the composition pipeline completely, the card stays at it’s lower state when idling, even lost about 9°c. Is this a normal behavior?
Is there any downside from disabling it?

Is there any progress on this issue?

The first thing we recommend doing to squeeze a little performance out of an aging card is to experiment with game settings themselves. While most reviewers and gamers test titles according to presets (Low, Medium, High, etc), this is a practical time-saving necessity for the former and a matter of convenience for the latter.

Gamers generally know that certain features explicitly tied to AMD or NV GPUs (think GameWorks) can incur heavy performance penalties on other architectures, but the same can be true for other features as well. It’s not unusual for a game’s implementation of ambient occlusion, tessellation, or antialiasing to hit one company’s GPU harder than another, and this can even vary depending on GPU family. Yes, simply lowering game settings or resolution can improve frame rate, but toggling specific features can get you nearly the same results for a smaller reduction in performance. In Deus Ex: Mankind Divided, turning on MSAA has a phenomenal performance impact, for example — much more than you’d typically expect. Linux Training In Pune

On an EVGA 3080 Ti FTW3 Ultra Hybrid, powermizer is stuck at the highest level and never drops to the lower levels when more than 2 monitors are connected. __GL_ExperimentalPerfStrategy=1 does not make any difference. With 2x1440p monitors running at 60hz, it does drop to the lowest power level. When I either bump one of the two monitors up to 144hz or add a third monitor at 60hz, it gets stuck at the highest level. When my monitors go into standby, it does clock down to the lowest level which I can see when I SSH into the machine and run nvidia-smi. My previous card, a Zotac 3080 amp holo, was able to clock down to the lowest power level with four 1440p monitors running (3 @ 75hz, 1 @ 144hz). Changing the power limit and/or clock offsets doesn’t make any difference.

nvidia-smi command below shows it stuck at P0 and using 87w even when usage is 0-1%. When I drop to 1 or 2 monitors, powermizer starts working and it idles at 28w and 32c. nvidia-bug-report file is attached.

$ echo $__GL_ExperimentalPerfStrategy 
1

$ nvidia-smi dmon
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210

nvidia-bug-report.log.gz (516.8 KB)

I can confirm the monitor-related observation:

My setup has two monitors connected (actually one monitor on DP, and a TV on HDMI, both 4k, clone mode).

While both monitors are connected, the GPU stays at the highest power level statically. It also causes latency spikes when the system is running for a long time, ultimately showing perf messages in the kernel:

[56896.930341] perf: interrupt took too long (3142 > 3131), lowering kernel.perf_event_max_sample_rate to 63600
[57913.607571] perf: interrupt took too long (3940 > 3927), lowering kernel.perf_event_max_sample_rate to 50700
[59785.486534] perf: interrupt took too long (4933 > 4925), lowering kernel.perf_event_max_sample_rate to 40500
[62631.956353] perf: interrupt took too long (6217 > 6166), lowering kernel.perf_event_max_sample_rate to 32100

The result is micro-freezes of the system. Every once in a while, the mouse cursor would stutter, keyboard inputs are delayed or skipped, video playback skips frames (but audio is not affected), scrolling isn’t smooth anymore, games become unpredictable due to jumpy mouse movement or gamepad input. If I leave the system running for long enough, the effects probably recover, just to come back suddenly.

While the system micro-freezes (short latency spikes or freezes, usually just milliseconds but enough to make mouse movement unpredictable on the desktop), I can go to nvidia-settings and disable the HDMI output, the system immediately recovers from the micro-freezes and the GPU enters low power states. Turning HDMI back on, and the GPU maximizes on power levels even when idling at 1-2% usage. The micro-stutters do not return at that point but eventually they will later.

A reboot usually also fixes the micro-stutters for some hours but GPU power levels stay at maximum.

This is extremely annoying especially while using the mouse. It took me a very long time to finally find this thread, and a work-around (disable second monitor). So I’m pretty sure it’s driver-related. This hasn’t been an issue months ago but I cannot pinpoint when it happened.

But it probably happened about the same time when I discovered that the TV would no longer properly be detected: It usually works after reboot but when I turn the TV off and back on, it only shows a black screen with “no signal detected” while the NVIDIA driver thinks it’s working perfectly fine and shows resolution/refresh/model etc. To fix this, I need to lower the resolution and put it back to 2160p, or I need to set it to 30 Hz instead of 60 Hz (which is quite useless for games which suddenly run at 20-30 fps instead of 50-60).

Update:

Using __GL_ExperimentalPerfStrategy=1 makes no difference.

Maybe related: NVIDIA 455.50.14 nvidia-modeset kernel crash on monitor re-plug

nvidia-bug-report.log.gz (1,1 MB)

Using __GL_ExperimentalPerfStrategy=1 makes no difference.

It can’t, the feature was enabled by default a long time ago.

I’ve also noticed night power draw with 2 4K monitors. I have two 4K monitors each connected over DP. When I drive them both at 60 Hz, my RTX 3090 stays in P0 and draws 118 W. If I set one to 59.94 Hz and the other to 60 Hz, it will drop to P5 at idle and draw 50 W. Setting one to 50 Hz and the other to 60 Hz allows it to drop to P8 and draw 40 W. Unfortunately, one of my monitors only lists 60 Hz and 30 Hz in the EDID for the 4K modes, so I didn’t test both at 59.94 Hz. I’ve seen the a similar result with an RTX 2070 Super and the same monitors.

Windows 10 will drop down to P8 with the GPU at 210 MHz and the RAM at 810 MHz with both monitors at 60 Hz.

I’ve seen @aplattner mention in the past that sometimes the card won’t down clock the RAM at high resolution because the time to change the RAM clock is longer than the frame time and it could underflow. Is there a bug in the driver for calculating that time and locking the perf level? Windows 10 is capable of dropping the card to P8.

nvidia-bug-report.log.gz (905.4 KB)

1 Like

There’s no way I could set 60 Hz for my TV:

# xrandr -q
Screen 0: minimum 8 x 8, current 5760 x 2160, maximum 32767 x 32767
DP-0 disconnected (normal left inverted right x axis y axis)
DP-1 connected 1920x1080+3840+540 (normal left inverted right x axis y axis) 160mm x 90mm
   1920x1080     60.00*+  59.94    50.00
   1680x1050     59.95
   1600x1200     60.00
   1440x900      59.89
   1280x1024     60.02
   1280x960      60.00
   1280x800      59.81
   1280x720      60.00    59.94    50.00
   1024x768      60.00
   800x600       60.32    56.25
   720x576       50.00
   720x480       59.94
   640x480       59.94
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
HDMI-0 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 1650mm x 930mm
   3840x2160     30.00 +  59.94*   50.00    29.97    25.00    23.98
   4096x2160     59.94    50.00    29.97    24.00    23.98
   2560x1440     59.95
   1920x1080    119.88   100.00    60.00    59.94    50.00    29.97    25.00    23.98
   1680x1050     59.95
   1600x900      60.00
   1440x900      59.89
   1280x1024     75.02    60.02
   1280x800      59.81
   1280x720      60.00    59.94    50.00
   1152x864      75.00
   1024x768      75.03    70.07    60.00
   800x600       75.00    72.19    60.32
   720x576       50.00
   720x480       59.94
   640x480       75.00    72.81    59.94
DP-4 connected 3840x2160+0+0 (normal left inverted right x axis y axis) 607mm x 345mm
   3840x2160     60.00*+
   2560x1440     59.95
   1920x1080     60.00    59.94
   1680x1050     59.95
   1600x900      60.00
   1440x900      59.89
   1280x1024     60.02
   1280x800      59.81
   1280x720      60.00
   1024x768      60.00
   800x600       60.32    56.25
   640x480       59.94
DP-5 disconnected (normal left inverted right x axis y axis)

And there’s even no way to have the same refresh rate on all monitors because while I could use 59.94 on two of three, or 60 on two of three, my main monitor exclusively supports 60 hz.

But all of them support unvalidated GSync so the driver just could drive them all at the same refresh rate in sync.

1 Like

Hello again.
I am the owner of MSI GeForce GTX 1080 Gaming X 8G https://www.msi.com/graphics-card/geforce-gtx-1080-gaming-x-8g.html and monitor SAMSUNG U28R55 28" Samsung U28R55 - Specifications
I have a question - why in Linux operating systems of the video card often switches to P0 mode? Absolutely in different moments.
Moving the terminal windows, opening the application menu, viewing video in Youtube (in different browsers). In Windows I can watch 4k video in P8-state!!! mode. That is, this is not the problem of hardware.
In the operating system Windows 10 I do not observe this. The difference in power consumption is huge. P8 - 14-20 watts, P0 - at least 46-50 watts.
Temperature in P0 state - up to 61C degree, but in P8 state in Windows about 37-40C!!! Smallest temperature in Linux i had 46C!!! It’s not normal.
I tested/measured on legal nvidia drivers from 385 to 510.
Linux operating systems: Fedora 34, openSUSE Tumbleweed, Ubuntu 16.04/18.04/20.04/21.04/21.10, Mx Linux 19.4.1, AntiX 19.4, Linux Mint 20.2, Debian 11 Testing, Devuan Chimaera 4.0 Alpha, Manjaro 21.0.7, Artix, ArchLab, RebordOS, Void Linux with different kernels - start from 4.9 and up to 5.16. All OS’es with different DE’s (XFCE, Gnome3, Cinnamon, Mate, KDE, JWM, FluxBox, IceWM).
So many years there is this generation of video cards, but the problem has not yet been solved. Window move in DE - power usage up to 46-50W about 30-43 seconds in P0-state.
Why? What to do? Nvidia does not think about the owners of their video cards?

Changing DE’s and Distro’s wont do much except waste time if anything.
Reverting to earlier kernels and nVidia drivers is literally going backwards.

Comparing non-cost “FREE” OpenSource Linux to Windows is like comparing a television to the color yellow.

My understanding of the situation is this…

Windows is an “at cost” Closed Source Operating System ecosystem.
When you install Windows onto a hardware platform it installs an executive microkernel with Desktop and generalised hardware support for that Hardware Platform Lineage detected at install.
At first update it sends all of the Hardware Specs, harware identifers to the Microsoft Servers and downloads and installs a proprietory “containerised” Operating system for that EXACT hardware Make and Model.
This includes Firmware, Drivers, the COMPLETE optimal stable configuration template for every single componant of the Windows Operating System “containerised” vm image as developed by the actual Manufacturer and Microsoft under the Microsoft Certified Hardware Partner.
There is a generalised and reasonable market expectation that at purchase of the “at Cost” hardware platform runnning Windows, that at first boot or as soon as possible there after the system is stable, fully supported , fully functioning " OutOfBox.

There are 2 versions of Linux.
The At-cost “FREE” Opensource Enterprise Linux Operating System.
and
The Non-cost “FREE” Opensource Community Linux Operating system.

With the At-cost “FREE” Opensource Enterprise Linux Operating System.
There is a gerneralised and reasonable market expectation that at purchase of the “At-Cost” hardware platform runnning Enterprise Linux that at first boot or as soon as possible there after. …
That the Linux Driven Hardware Platform is competitively comparable to a similiar product already at Market
using metrics such as OutofBox Capability, Functionality, Performance and Reliability

Regarding The Non-cost “FREE” Opensource Community Linux Operating system.
Things get a bit more complicated.

While after install it offers a stable Operating System experience and there is great generalised hardware support, functionality and capability “OutofBox”…

It is COMPLETELY unconfigured for the optimal functionality, capability and performance on the specific hardware platform that it has been absolutely and correctly installed onto /into.

Why? Because OpenSource and “FREE” are about Opensource Economy and OpenSource Commerial Enterprise.

This is where the “FREE” part comes in…

With the OpenSource (Product) non-cost “FREE” Linux Operating System.
The OpenSource Adopter is “FREE” to contract an Open Source Enterprise to provide a Product, Service or some such other Solution to deliver an Open Source Linux driven hardware platform. As already stated earlier…
OR
The OpenSource Adopter is “FREE” to gain the knowledge and acquire the skillset to completely and correctly configure and integrate all of the componants of the non-cost “FREE” OpenSource Linux Operating System onto/ into all of the componants of their specific hardware platform.

With non-cost “FREE” OpenSource Linux Operating System when a new piece Hardware/ and the relevent Driver is completely and correctly installed it then needs to be completely and correctly configured and integrated into / onto all of the componants of the non-cost “FREE” OpenSource Linux Operating System and onto/ into all of the componants of their specific hardware platform. This also involves resolving conflicts and system chokes.

With that said. Moving onto the various Distro’s and configuring and integrating.
As I understand it the default locations and packagemangers may and do vary from Distro to Distro,
but the config’ing is in almost all circumstances (relevent to the release) is EXACTLY the same.

From what I’ve read the older MSI bios’s where a bit off the track compared to other hardware vendors regarding thier ACPI relative to the Linux kernel.

My advice is to absolutely dont use any 3rd party powermanagement apps.
They need a huge amount of configuration and understanding which mutes the point.
I hate them all.
The onchip instruction set of modern hardware is more that suffient. The trick is to removing the conflicts that stop them from successfuly executing. Keep it Simple

Also with nVidia-Prime developed by Ubuntu Devs if Im not mistaken, Manjaro Optimus Manager, Bumblebee.
And the rest. I hate them aswell.
They have funtions that arent safely or logically applicable to most nVidia System Platforms but can be enforced by the ill advised.
Like driving a Truck with your feet… Possible yes but it aint a great idea.

There was a great nVidia Optimus Whitepaper but its been removed.
I suggest reading the Max Q Advanced Optimus White paper.

  • When posting for community support it is invaluable to include [inxi -Fz] output.

I take it that your system is a Desktop.
AMD / INTEL? Ill use intel as an example but AMD has its own issues.
I’m of the understanding that we want to remove as much control as we can from the BIOS and hand it over to the kernel.
If its INTEL and Enhanced Intel SpeedStep Technology is active in the BIOS, if so, then to deactivated it
Disable CPU C-States. This prevents the CPU from going into power saving mode, which can cause latency when the CPU needs to power back up. (Kernel parameter intel_idle.max_cstate=0)
Never use HyperThreading and “idle=poll”

Are the correct BIOS/ MBoard Parameters set?
Disable any power saving it the BIOS. TurboBoost and the like.C states, P states, Tstates if applicable.

How do you know the temp values are correct?
Have you completely installed, configured and run lm-sensors or something else?

Is your acpid and acpid support installed and running?
Is thermald configured and running? GitHub - intel/thermal_daemon: Thermal daemon for IA

In Linux your CPU frequency scaling needs to be in allignment with the nVidia Card.
Which one is it? What type/ mode?

Then theres numa configing, or numa balancing off or numa off,

HugePages can improve system performance by reducing the amount of system resources required to access page table entries.

Swappiness may play a role.

Do you need to add your user to the video goup for acceleration?
What Kernel Parameters / Env Variables are you using?

Thats just that start of it.
Add assuming youve fully and correctly installed and configured your nvidia driver.

Below are some links you may find useful:

Anyway thats my take on it.

The problem is with Nvidia drivers for Linux, right now I’m using Windows 11 on my laptop till MacBook Pro M2 gets released.

They just don’t care about laptop users.

Why I just Can’t use Linux on my gaming laptop, so much problem with the Nvidia drivers, when I use Hybrid mode Xorg uses so high cpu usage is about 30- 50% when using an external monitor. After changing onto Nvidia performance mode the xorg went down but then the Nvidia power is boosting all the way up, it just heat my machine triggered fans out loud.
Really not usable, I am a software developer need to use Linux on a gaming regular laptop but it can’t be really happen