Failed to unload nvidia driver (MSI 6QE GS60 | Nvidia 970M | Debian 10)

Hi,

I was struggling at getting bbswitch to work successfully, as the discrete graphic card doesn’t seem to power down when issuing command

echo OFF > /proc/acpi/bbswitch

I’ve tried to narrow down the issue, here is what I’ve found soo far.

First of all, my configuration:
MSI GS60 6QE-025XFR
Debian 10 (Kernel 4.19.0-6-amd64)

Graphic cards:
with lspci -vvnn command:

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] HD Graphics 530 [1462:1158]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 123
	Region 0: Memory at db000000 (64-bit, non-prefetchable) 
	Region 2: Memory at 70000000 (64-bit, prefetchable) 
	Region 4: I/O ports at f000 
	[virtual] Expansion ROM at 000c0000 [disabled] 
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee02004  Data: 4022
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv-, Max PASID Width: 14
		PASIDCtl: Enable- Exec- Priv-
	Capabilities: [200 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable-, Smallest Translation Unit: 00
	Capabilities: [300 v1] Page Request Interface (PRI)
		PRICtl: Enable- Reset-
		PRISta: RF- UPRGI- Stopped-
		Page Request Capacity: 00008000, Page Request Allocation: 00000000
	Kernel driver in use: i915
	Kernel modules: i915
01:00.0 3D controller [0302]: NVIDIA Corporation GM204M [GeForce GTX 970M] [10de:13d8] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] GM204M [GeForce GTX 970M] [1462:1158]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 127
	Region 0: Memory at dc000000 (32-bit, non-prefetchable) 
	Region 1: Memory at b0000000 (64-bit, prefetchable) 
	Region 3: Memory at c0000000 (64-bit, prefetchable) 
	Region 5: I/O ports at e000 
	[virtual] Expansion ROM at dd000000 [disabled] 
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee20004  Data: 4022
	Capabilities: [78] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 unlimited
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Via message
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [258 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [128 v1] Power Budgeting <?>
	Capabilities: [420 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900 v1] #19
	Kernel driver in use: nvidia
	Kernel modules: nvidia
with <b><i>lshw</i></b> command:
*-display                 
       description: 3D controller
       product: GM204M [GeForce GTX 970M] [10DE:13D8]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:127 memory:dc000000-dcffffff memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=128) memory:dd000000-dd07ffff
  *-display
       description: VGA compatible controller
       product: HD Graphics 530 [8086:191B]
       vendor: Intel Corporation [8086]
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 06
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:123 memory:db000000-dbffffff memory:70000000-7fffffff ioport:f000(size=64) memory:c0000-dffff

All packages have been installed through Debian repository. This laptop has a color LED on power button letting know which graphic card is in use:

  • blue: integrated
  • orange: discrete

When running command:

root@msi-gs60-6qe:~# echo OFF > /proc/acpi/bbswitch

I get this message logged:

[ 2001.428820] bbswitch: device 0000:01:00.0 is in use by driver 'nvidia', refusing OFF

For now, when Linux boots, integrated graphic card seems to be used (blue power LED). When login screen appears, the power LED switches to orange.
I tried creating a file /etc/modprobe.d/bbswitch.conf with:

options bbswitch load_state=0 unload_state=1

but just when login screen should appear, the computer freezes, stuck on text boot screen.

The file /etc/modprobe.d/bumblebee.conf contains blacklist nvidia. And from the initramfs, I think it is really blacklisted:

root@msi-gs60-6qe:~# lsinitramfs /boot/initramfs-4.19.0-6-amd64.img | egrep -i "kernel/drivers/gpu"
usr/lib/modules/4.19.0-6-amd64/kernel/drivers/gpu
usr/lib/modules/4.19.0-6-amd64/kernel/drivers/gpu/drm
usr/lib/modules/4.19.0-6-amd64/kernel/drivers/gpu/drm/drm.ko
usr/lib/modules/4.19.0-6-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko
usr/lib/modules/4.19.0-6-amd64/kernel/drivers/gpu/drm/i915
usr/lib/modules/4.19.0-6-amd64/kernel/drivers/gpu/drm/i915/i915.ko

From the command nvidia-smi, the discrete graphic card seems powered up:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.74       Driver Version: 418.74       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970M    On   | 00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8     5W /  N/A |      0MiB /  3024MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

However, glxinfo returns this:

OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.3.6

and optirun glxinfo returns this:

OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce GTX 970M/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 418.74

Which means the discrete graphic cards is not used when not needed. At least, it’s a good start…

I’ve tried to power down/up the discrete graphic card using this guide:
https://wiki.archlinux.org/index.php/Hybrid_graphics#Fully_Power_Down_Discrete_GPU

Running the examples shows me that _SB.PCI0.PEG0.PEGP._OFF bus worked. Indeed, the laptop’s power LED has turned blue!
Then, issuing

echo '\_SB.PCI0.PEG0.PEGP._ON' > /proc/acpi/call

to turn the discrete graphic card on, the laptop’s power LED turns orange.
However, if the discrete graphic card is turned off this way, nvidia driver is still loaded and the computer hangs if it tries to access the graphic card.

I think bbswitch command would be able to correctly power down the discrete graphic card, if only nvidia driver could be unloaded!

Any idea about what might prevent nvidia driver from being unloaded?

I don’t see anything in your post that you ever tried unloading the driver:
either
sudo modprobe -r nvidia
or
sudo rmmod nvidia-drm nvidia-modeset nvidia-uvm nvidia

Hi,

Yes I forgot to talk about that. I tried to unload nvidia module myself.

root@msi-gs60-6qe:~# modprobe -r nvidia
modprobe: FATAL: Module nvidia is in use.
modprobe: FATAL: Error running remove command for nvidia

The same happens even if I power off the graphic card:

root@msi-gs60-6qe:~# echo '\_SB.PCI0.PEG0.PEGP._OFF' > /proc/acpi/call
root@msi-gs60-6qe:~# modprobe -r nvidia
modprobe: FATAL: Module nvidia is in use.
modprobe: FATAL: Error running remove command for nvidia

As suggested, I’ve also ran (after a reboot):

root@msi-gs60-6qe:/usr/src/nvidia-current-418.74# rmmod nvidia-drm nvidia-modeset nvidia-uvm nvidia
rmmod: ERROR: Module nvidia_drm is not currently loaded
rmmod: ERROR: Module nvidia_uvm is not currently loaded
rmmod: ERROR: Module nvidia is in use
root@msi-gs60-6qe:/usr/src/nvidia-current-418.74# ps -aux | egrep -i nvidia
root       745  0.0  0.0      0     0 ?        S    17:17   0:00 [irq/127-nvidia]
root       746  0.0  0.0      0     0 ?        S    17:17   0:00 [nvidia]
nvpd      1767  0.0  0.0   4672  1484 ?        Ss   17:20   0:00 /usr/bin/nvidia-persistenced --user nvpd
root      2262  0.0  0.0   6208   880 pts/0    S+   17:38   0:00 grep -E -i nvidia
root@msi-gs60-6qe:/usr/src/nvidia-current-418.74# service nvidia-persistenced stop
root@msi-gs60-6qe:/usr/src/nvidia-current-418.74# service nvidia-persistenced status
● nvidia-persistenced.service - NVIDIA Persistence Daemon
   Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2019-09-19 17:38:54 CEST; 2s ago
  Process: 1766 ExecStart=/usr/bin/nvidia-persistenced --user nvpd (code=exited, status=0/SUCCESS)
  Process: 2288 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)
 Main PID: 1767 (code=exited, status=0/SUCCESS)

Sep 19 17:20:51 msi-gs60-6qe systemd[1]: Starting NVIDIA Persistence Daemon...
Sep 19 17:20:51 msi-gs60-6qe nvidia-persistenced[1767]: Started (1767)
Sep 19 17:20:51 msi-gs60-6qe systemd[1]: Started NVIDIA Persistence Daemon.
Sep 19 17:38:54 msi-gs60-6qe systemd[1]: Stopping NVIDIA Persistence Daemon...
Sep 19 17:38:54 msi-gs60-6qe systemd[1]: nvidia-persistenced.service: Succeeded.
Sep 19 17:38:54 msi-gs60-6qe systemd[1]: Stopped NVIDIA Persistence Daemon.
root@msi-gs60-6qe:/usr/src/nvidia-current-418.74# modprobe -r nvidia
modprobe: FATAL: Module nvidia is in use.
modprobe: FATAL: Error running remove command for nvidia

It might not be relevant, but here are some other informations:

Nvidia-related opened files:

root@msi-gs60-6qe:~# lsof | egrep -i nvidia
COMMAND    PID  TID TASKCMD               USER   FD      TYPE             DEVICE  SIZE/OFF       NODE NAME
nvidia-pe  641                            nvpd  cwd       DIR                8,6      4096          2 /
nvidia-pe  641                            nvpd  rtd       DIR                8,6      4096          2 /
nvidia-pe  641                            nvpd  txt       REG                8,6     55384    1472885 /usr/bin/nvidia-persistenced
nvidia-pe  641                            nvpd  mem       REG                8,6    191272    2121662 /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.418.74
nvidia-pe  641                            nvpd  mem       REG                8,6     55792    1439625 /usr/lib/x86_64-linux-gnu/libnss_files-2.28.so
nvidia-pe  641                            nvpd  mem       REG                8,6   1824496    1439613 /usr/lib/x86_64-linux-gnu/libc-2.28.so
nvidia-pe  641                            nvpd  mem       REG                8,6     14592    1439615 /usr/lib/x86_64-linux-gnu/libdl-2.28.so
nvidia-pe  641                            nvpd  mem       REG                8,6    165632    1438999 /usr/lib/x86_64-linux-gnu/ld-2.28.so
nvidia-pe  641                            nvpd    0uW     REG               0,21         4      22917 /run/nvidia-persistenced/nvidia-persistenced.pid
nvidia-pe  641                            nvpd    1u     unix 0x000000007a99084f       0t0      22918 type=DGRAM
nvidia-pe  641                            nvpd    2u      CHR            195,255       0t0      20230 /dev/nvidiactl
nvidia-pe  641                            nvpd    3u      CHR              195,0       0t0      20231 /dev/nvidia0
nvidia-pe  641                            nvpd    5u      CHR              195,0       0t0      20231 /dev/nvidia0
nvidia-pe  641                            nvpd    6u      CHR              195,0       0t0      20231 /dev/nvidia0
nvidia-pe  641                            nvpd    7u      CHR              195,0       0t0      20231 /dev/nvidia0
nvidia-pe  641                            nvpd    8u     unix 0x00000000fdb1d0f7       0t0      25806 /var/run/nvidia-persistenced/socket type=STREAM
nvidia     725                            root  cwd       DIR                8,6      4096          2 /
nvidia     725                            root  rtd       DIR                8,6      4096          2 /
nvidia     725                            root  txt   unknown                                         /proc/725/exe
Xorg       767                            root  mem       REG                8,6    667928    1472752 /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.74
Xorg       767                            root  mem       REG                8,6   1210304    2121647 /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.418.74
Xorg       767                            root   15u      CHR            195,255       0t0      20230 /dev/nvidiactl
Xorg       767                            root   16u      CHR              195,0       0t0      20231 /dev/nvidia0
Xorg       767                            root   17u      CHR              195,0       0t0      20231 /dev/nvidia0
Xorg       767  774 Xorg:disk             root  mem       REG                8,6    667928    1472752 /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.74
Xorg       767  774 Xorg:disk             root  mem       REG                8,6   1210304    2121647 /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.418.74
Xorg       767  774 Xorg:disk             root   15u      CHR            195,255       0t0      20230 /dev/nvidiactl
Xorg       767  774 Xorg:disk             root   16u      CHR              195,0       0t0      20231 /dev/nvidia0
Xorg       767  774 Xorg:disk             root   17u      CHR              195,0       0t0      20231 /dev/nvidia0
Xorg       767  775 Xorg:disk             root  mem       REG                8,6    667928    1472752 /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.74
Xorg       767  775 Xorg:disk             root  mem       REG                8,6   1210304    2121647 /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.418.74
Xorg       767  775 Xorg:disk             root   15u      CHR            195,255       0t0      20230 /dev/nvidiactl
Xorg       767  775 Xorg:disk             root   16u      CHR              195,0       0t0      20231 /dev/nvidia0
Xorg       767  775 Xorg:disk             root   17u      CHR              195,0       0t0      20231 /dev/nvidia0
Xorg       767  780 InputThre             root  mem       REG                8,6    667928    1472752 /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.74
Xorg       767  780 InputThre             root  mem       REG                8,6   1210304    2121647 /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.418.74
Xorg       767  780 InputThre             root   15u      CHR            195,255       0t0      20230 /dev/nvidiactl
Xorg       767  780 InputThre             root   16u      CHR              195,0       0t0      20231 /dev/nvidia0
Xorg       767  780 InputThre             root   17u      CHR              195,0       0t0      20231 /dev/nvidia0

Nvidia-related modules:

root@msi-gs60-6qe:~# lsmod | egrep -i nvidia
Module                  Size  Used by
nvidia              17940480  24
ipmi_msghandler        65536  2 ipmi_devintf,nvidia

Nvidia-related processes:

root@msi-gs60-6qe:~# ps -aux | egrep -i nvidia
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       745  0.0  0.0      0     0 ?        S    17:17   0:00 [irq/127-nvidia]
root       746  0.0  0.0      0     0 ?        S    17:17   0:00 [nvidia]
nvpd      1767  0.0  0.0   4672  1484 ?        Ss   17:20   0:00 /usr/bin/nvidia-persistenced --user nvpd
root      1773  0.0  0.0      0     0 ?        S    17:20   0:00 [nvidia-modeset]
root      1801  0.0  0.0   6208   884 pts/0    S+   17:21   0:00 grep -E -i nvidia
  1. nvidia-persistenced has of course to be stopped since it’s keeping the driver initialized.
  2. the lsof output shows that, while it’s using the igpu for output, xorg still has opened the nvidia driver.

The procedure would be

  1. stop and disable nvidia-presistenced
  2. create a blacklist in /etc/modprobe.d for the nvidia modules (optional but better)
  3. stop the xserver
  4. unload the nvidia modules
  5. turn off the nvidia card using bbswitch
  6. start the xserver.

Thank you generix. So, I’ve just tried that, without success :(

First of all, I’ve stopped/started lightdm, to be sure this works. No problem.

Then, I’ve gone through all those steps:

  1. service lightdm stop
  2. service nvidia-persistenced stop
  3. modprobe -r nvidia-drm nvidia-modeset nvidia-uvm nvidia
  4. echo OFF > /proc/acpi/bbswitch
  5. service lightdm start

Except for the last one, all those steps went fine. After step 3, nothing related to nvidia was listed by ps, lsof and lsmod.
After step 4, the power LED turned blue. dmesg also reported

bbswitch: disabling discrete graphics
pci_raw_set_power_state: 6 callbacks suppressed
pci 0000:01:00.0: Refused to change power state, currently in D0

Unless for the last message, which I wonder if it’s relevant as the discrete graphic card is not powered, everything seems to run fine.

However, when I restart lightdm, the computer freezes before the GUI shows up. Maybe X tries to load/access nvidia driver, and as the graphic card is powered down it crashed?

That sounds like some acpi/pci issue. Please try setting the kernel parameter
acpi_osi=! acpi_osi=“Windows 2009”
and retry. Does a normal suspend/resume cycle work when using the nvidia gpu?

Hi,

Without any other modification, suspend/resume freezes the computer.
However, the kernel parameter

acpi_osi=! acpi_osi="Windows 2009"

allowed me to reproduce steps listed in the previous message with success! LightDM started with nvidia gpu powered down.
I can also power up/down nvidia gpu with bbswitch.

After reboot with the new kernel parameter, lsof did not report Xorg having handles on nvidia files. I’ve been able to modprobe -r all nvidia modules and power down nvidia gpu with bbswitch, without having to stop xserver.

Instead of 20W (minimum, less than 2h) discharge rate, it’s now 10W (more than 4h reported).
Also, it is worth noting that now issuing a command such nvidia-smi doesn’t freeze the computer, but gracefully throws an error about nvidia driver being unloaded.

Thank you very much!

Then your system is probably hit by this bug:
https://bugzilla.kernel.org/show_bug.cgi?id=156341
Reason unknown so far.

I can take more time if it can bring some insights on the reason. What can I do?