Powering Nvidia 1050Ti *off* on Asus GL753VE with bbswitch causes fan speed to become permanently stuck at 100% speed

Hello.

I have Asus GL753VE laptop with Nvidia 1050Ti running linux 4.14 with driver 384.98 and bios 306 on Debian Unstable.

When I use bumblebee and bbswitch to power the nvidia GPU on and run software via primusrun, everything works perfectly, glxinfo reports correct values, and graphics are accelerated as they should be.

However, when the primus process has finished running and bbswitch powers the nvidia GPU off, after 5-10 seconds system fan spins up to 100% speed and is stuck there, permanently, creating huge amount of noise. There is no method to spin it down other than completely power off the system (reboot is not enough).

This issue is also present on some other laptop models. It has been reported to every relevant project and discussed in depth but to no avail on this particular laptop model:






https://bugzilla.kernel.org/show_bug.cgi?id=156341

This issue has been also reported to AsusTek Computer Inc by multiple customers, but that company has not even acknowledged that the bug exists:

https://rog.asus.com/forum/showthread.php?98775-LINUX-Fan-speed-issue-on-GL753VE-nvidia-bumblebee-bbswitch-ACPI

Currently as a workaround I am running xorg and everything else directly through mechanism described here http://us.download.nvidia.com/XFree86/Linux-x86_64/370.23/README/randr14.html
but a solution to the bbswitch issue would be preferred.

Reporting this to nvidia is basically last resort.

Thank you in advance for any information you can provide to help fix this.

nvidia-bug-report.log.gz - https://my.mixtape.moe/rvwsvc.gz

You are hit by this bug:
https://bugzilla.kernel.org/show_bug.cgi?id=156341
The easy workaround would be to use kernel parameter
acpi_osi=! acpi_osi=“Windows 2009”
Unfortunately, on your model probably your touchpad won’t work then. To work around that, you have to change an acpi table and load that on boot. Luckily one of the previous users wrote a howto after fiddling this out on the bumblebee issue you mentioned.
https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-306543064

Thanks for the quick response.

I have tried various combinations of acpi_osi, acpi_rev_override and pcie_port_pm kernel parameters, including the one you have posted, but none of them have resulted in fixing of the stuck fan ( but I can confirm that you are correct about disabled touchpad with acpi_osi=“Windows 2009” ).

I will give the described ACPI table patching method a try.

Run acpidump and attach the output, I’ll take a look at it.

acpidump.txt.zip - https://my.mixtape.moe/omluno.zip

Looking at the acpidump, the kernel parameter should reliably work around the issue. Since it doesn’t, this seems to be something new. Can you attach a dmesg output from when the issue hits, with kernel paramter set?

Full dmesg after boot with acpi_osi=! acpi_osi=‘Windows 2009’ into xorg ran by integrated GPU alone, with bumblebee service disabled:

https://my.mixtape.moe/vvfhtl.zip

The lines up to the last with 7.15… timestamp are what’s there immediately after boot.
The lines with 106.88… timestamps are what’s added when I manually start bumblebee service ( service bumblebeed start ).

About 20-30 seconds after performing that command, the fan started to spin at maximum speed and is still doing it.

Thats’s definetly something new.

[  107.052084] pci_raw_set_power_state: 13 callbacks suppressed
[  107.052090] pci 0000:01:00.0: Refused to change power state, currently in D0

Try additional kernel parameter
pcie_port_pm=off
leave the acpi_osi settings in.

Full dmesg for the same actions as above but with pcie_port_pm=off kernel parameter:

https://my.mixtape.moe/yienba.zip

The system behavior is the same (fan at 100%, disabled touchpad).

No luck, then.
Though it’s not the same, maybe attach to the mentioned kernel bug report and upload your acpidump there. Perhaps someone has an idea there.
If you want to use bumblebee, you would have to disable bbswitch to not run into that issue. The dGPU will not be powered down then, leaving you with a higher power consumption.

My concerns are that when the dGPU fails to power off it seems to get into a full throttle situation, is heating up so the fans start to run. You should be able to check that using powertop. Question is, what’s happening on a system suspend? If the dGPU is still on, but the fans are off, this might get ugly.

The solution to the Asus GL753VE fan problem has been found, it is documented here:

https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-559980823