Ubuntu 20.04 - NVIDIA GPU consuming power even when using only integrated graphics card (Intel iGPU)

The issue is pretty simple: using Nvidia X Sever Settings GUI to switch from NVIDIA GPU to Intel iGPU doesn’t work and the NVIDIA GPU does still consume some power after rebooting and, as a consequence, it generates unnecessary heat. Same thing (obviously) happens if I use prime-select from the terminal. I can see from powertop that when I have everything on idle, I am still consuming around 18-22W which is at least 10W more than I would expect.

This seems to be a rather old bug and supposedly it got “fixed”. I can find at least two launchpad bug reports (https://bugs.launchpad.net/ubuntu/+source/nvidia-prime/+bug/1765363) and there are some workaround there. Problem is that at this point I am not even sure what works and what doesn’t since most posts are over 3 years old. I tried for example installing ubuntu without selecting “install third party drivers” since someone suggested it would fix the problem but it didn’t work in my case.
Someone else here recently posted about this issue again → Nvidia-prime not powering off the dGPU - Desktop - Ubuntu Community Hub
I used to have another Optimus laptop years ago and I remember it worked fine in older Ubuntu versions but now something seems wrong.

Config:

  • Zephyrus M16, 11800H, 3070
  • Ubuntu 20.04 (but same problem with 21.10)
  • Nvidia Driver 470 and 460 tested.

Did you try using “on-demand” mode?
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

So I just tried the on-demand mode and it doesn’t seem to make a big difference. It consumes the same amount of power of the other two modes (with Intel being just slightly lower).

Having said that, if I install bbswitch and run “cat /proc/acpi/bbswitch” it returns ON. I also run powertop to make sure I don’t have any crazy process consuming power (would be pretty unlikely since it is a clean install). Not sure what’s going on here.

intel-mode-report.log.gz (179.1 KB)
on-demand-report.log.gz (391.1 KB)

I attached two logs (one in on-demand mode and the other when using Intel mode).

Runtime pm is supported and enabled, two test cases to check where this power comsumption comes from:
in intel mode:
run powertop, note power
load bbswitch
run

sudo tee /proc/acpi/bbswitch <<<OFF

to turn the gpu off, check powertop output.

switch to on-demand mode, don’t run nvidia-settings or nvidia-smi, this will wake up the gpu
run powertop, then

cat /sys/bus/pci/devices/0000:01:00.0/power/control

again until output is ‘suspended’, check powertop output.
power consumption should be the same in both cases.

Sorry just noticed it’s the wrong sysfs node, should be

cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status

Ok, so here is what I did. I start in Intel mode and out of the box I get around 20W with nothing running in the background.

After that I run

modprobe bbswitch 
sudo tee /proc/acpi/bbswitch <<<OFF

I check the discharge rate once again in powertop and it hasn’t changed at all, still around 20W. One “weird” this is that if I run cat /proc/acpi/bbswitch I get ON instead of OFF. It seems like the previous OFF command had no effect which is quite weird.

Then I switch to on-demand mode and reboot. After that power usage seems to be slightly higher but not by much (24-26W). I tried running the cat command in your second comment for over 5 minutes but kept getting

status
active

So I am quite lost here, this weekend I might try few other workarounds I found online but not sure what is going on here. Thanks a lot for the help by the way, appreciate it :)

Please attach a dmesg output in intel mode after trying to use bbswitch to turn off the gpu.

out.txt (139.0 KB)

Not really an expert but it seems weird that around line 1100, it seems to first switch off the gpu and then turn it back on right after that. Not sure if there is other useful stuff in that dump to understand why it is happening. Thanks again!

Might be bbswitch colliding with runtime pm. Please post the output of
grep 10de /lib/udev/rules.d/*

Output before turning off the dGPU with bbswitch in Intel mode:

/lib/udev/rules.d/71-nvidia.rules:SUBSYSTEM=="pci", ATTRS{vendor}=="0x10de", DRIVERS=="nvidia", TAG+="seat", TAG+="master-of-seat"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x03[0-9]*", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/90-asusd-nvidia-pm.rules:ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/90-asusd-nvidia-pm.rules:ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/90-asusd-nvidia-pm.rules:ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
/lib/udev/rules.d/90-asusd-nvidia-pm.rules:ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"

After turning off dGPU:

/lib/udev/rules.d/71-nvidia.rules:SUBSYSTEM=="pci", ATTRS{vendor}=="0x10de", DRIVERS=="nvidia", TAG+="seat", TAG+="master-of-seat"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x03[0-9]*", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/90-asusd-nvidia-pm.rules:ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/90-asusd-nvidia-pm.rules:ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
/lib/udev/rules.d/90-asusd-nvidia-pm.rules:ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
/lib/udev/rules.d/90-asusd-nvidia-pm.rules:ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"

It seems you also have some asusd installed that interferes with power management. Please disable/uninstall it. Then move both files

/lib/udev/rules.d/71-nvidia.rules
/lib/udev/rules.d/90-asusd-nvidia-pm.rules

to your home directory for later usage and run
sudo update-initramfs -u
to remove them from the initrd. After reboot, please try using bbswitch again.

Tried that, unfortunately same result, switches back to ON right after setting it to OFF. I also remember I tried it once with a fresh install without asusd and it was the same thing.

Not that it really matters but I have noticed that powertop detects the GPU as a network itnerface, example:

23.7 W 0,0 pkts/s Device Network interface: wlo1 (iwlwifi)

Telling by the behaviour, it seems the acpi is broken/incompatible with linux. Please attach an acpidump

acpidump.out (2.4 MB)

Here it is. I will add that Optimus works fine when I use Windows (dual boot)

For testing, please try setting kernel parameter

acpi_osi=! acpi_osi="Windows 2009"

and check whether bbswitch then works properly.
bbswitch already loads in the disabled state, so no echo “off” should be necessary.

This will likely break touchpad support or anything else, this is just for finding the point where it breaks.

So I tried that and it seems to have no effect at all. Even the touchpad and everything else keeps working. If i look once again at dmesg I still get

[    1.799306] bbswitch: loading out-of-tree module taints kernel.
[    1.799322] bbswitch: module verification failed: signature and/or required key missing - tainting kernel
[    1.799488] bbswitch: version 0.8
[    1.799492] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PC00.GFX0
[    1.799500] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PC00.PEG1.PEGP
[    1.799510] ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20201113/nsarguments-61)
[    1.799561] bbswitch: detected an Optimus _DSM function
[    1.799569] pci 0000:01:00.0: enabling device (0006 -> 0007)
[    1.799677] bbswitch: disabling discrete graphics
[    1.800119] systemd-journald[378]: Received client request to flush runtime journal.
[    1.809272] Adding 2097148k swap on /swapfile.  Priority:-2 extents:6 across:2260988k SSFS
[    1.817233] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on

I also made sure the kernel parameters were set using cat /proc/cmdline and I get
BOOT_IMAGE=/boot/vmlinuz-5.11.0-41-generic root=UUID=7cbdc114-6741-41de-95b9-7b1cbd2f04ce ro quiet splash acpi_osi=! "acpi_osi=Windows 2009" vt.handoff=7

Not sure what’s going on :( Maybe I did something wrong but seems unlikely. Do you have other ideas about what I could try? If I have some time this weekend I might try to install Fedora just to check if it works there but I hope I don’t need to do that.

I also noticed that I do get some ACPI errors during boot but I think they are “normal”.

ok, nothing came up with this. Please remove the parameters and check if the latest kernel has some fixes for this
https://launchpad.net/~damentz/+archive/ubuntu/liquorix

Tried 5.15 and once again same result. At this point I also think I made too many changes so this weekend I will go ahead and do a clean install and try few different distros too.

Having said that, I don’t think it could be an hardware issue considering it works perfectly on Windows, right? I also don’t think it could be related to Secure boot

Small update, I tried installing POP OS (which as far as I understand uses a different mechanism to switch between GPUs). So I first tried NVIDIA only mode and it was discharging at around 30W, which is expected. After that I tried Intel mode and once again the NVIDIA GPU was still on and the power draw was the same (or even higher!) than using the Nvidia mode.
Finally, when enabling Hybrid mode, I saw some progress. For the first time I got less than 20W of discharge rate, more specifically 16-17W which I think makes sense in Hybrid mode.

While that’s not bad, I still sucks that Intel mode can’t work. My hypothesis is that the dGPU doesn’t get shut down properly and then it runs without the proper performance profile messing up everything and consuming even more power than nvidia mode. Considering my laptop is generally pretty new (came out in July) it is possible that maybe it isn’t properly “supported” in the Linux kernel, even if that is weird. Also hardware failure is unlikely considering it works just fine on Windows.

EDIT: Finally found something which worked!!! I installed PopOS once again but without selecting the Nvidia version which comes with pre-insntalled Drivers! To my surprise this time I was finally able to see a constant discharge rate of just 5-6W. Obviously this is not ideal since now I can’t use the dGPU at all and I am even more confused because on Ubuntu it never worked even when I selected to not install additional drivers. But at least I know it can work even if I really don’t understand what’s the issue normally then.