Can't sleep with nvidia-driver-390 and GeForce GT 750M Mac Edition

I’ve upgraded from Ubuntu 17.10 to 18.04 and as part of that the nvidia driver got upgraded at some point.

Running nvidia-driver-390 if I attempt to enter sleep, the laptop appears to go to sleep, lights go off, fan stops etc, but after a few seconds it wakes back up.

I’ve tried a handful of changes but so far uninstalling the 390 driver and using either the nouveau or nvidia-340 seems to be the variable that allows the laptop to sleep correctly.

lspci reports the card as:

01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 750M Mac Edition] (rev a1)

A failed attempt at suspending (with nvidia-driver-390) logs this in dmesg:

[  516.968380] PM: suspend entry (deep)
[  516.968382] PM: Syncing filesystems ... done.
[  518.796981] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  518.798587] OOM killer disabled.
[  518.798587] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
[  518.799556] Suspending console(s) (use no_console_suspend to debug)
[  518.806484] ERROR @wl_notify_scan_status : 
[  518.806485] wlp3s0 Scan_results error (-22)
[  518.851496] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[  518.852112] sd 1:0:0:0: [sda] Stopping disk
[  518.864464] thunderbolt 0000:08:00.0: suspending...
[  518.865782] thunderbolt 0000:08:00.0: suspend finished
[  518.865785] thunderbolt 0000:08:00.0: stopping RX ring 0
[  518.865795] thunderbolt 0000:08:00.0: disabling interrupt at register 0x38200 bit 12 (0x1001 -> 0x1)
[  518.865810] thunderbolt 0000:08:00.0: stopping TX ring 0
[  518.865819] thunderbolt 0000:08:00.0: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
[  518.865833] thunderbolt 0000:08:00.0: control channel stopped
[  519.817386] ACPI: Preparing to enter system sleep state S3
[  519.879437] ACPI: EC: event blocked
[  519.879438] ACPI: EC: EC stopped
[  519.879439] PM: Saving platform NVS memory
[  519.879445] Disabling non-boot CPUs ...
[  519.914047] smpboot: CPU 1 is now offline
[  519.928683] smpboot: CPU 2 is now offline
[  519.953802] smpboot: CPU 3 is now offline
[  519.998174] smpboot: CPU 4 is now offline
[  520.015573] IRQ 18: no longer affine to CPU5
[  520.015577] IRQ 28: no longer affine to CPU5
[  520.016893] smpboot: CPU 5 is now offline
[  520.039652] IRQ 32: no longer affine to CPU6
[  520.039655] IRQ 33: no longer affine to CPU6
[  520.040670] smpboot: CPU 6 is now offline
[  520.063838] IRQ 26: no longer affine to CPU7
[  520.063847] IRQ 29: no longer affine to CPU7
[  520.063863] IRQ 51: no longer affine to CPU7
[  520.064876] smpboot: CPU 7 is now offline
[  520.096639] ACPI: Low-level resume complete
[  520.096709] ACPI: EC: EC started
[  520.096710] PM: Restoring platform NVS memory
[  520.104919] Enabling non-boot CPUs ...
[  520.104983] x86: Booting SMP configuration:
[  520.104985] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  520.227925]  cache: parent cpu1 should not be sleeping
[  520.410390] CPU1 is up
[  520.410441] smpboot: Booting Node 0 Processor 2 APIC 0x4
[  520.513934]  cache: parent cpu2 should not be sleeping
[  520.685043] CPU2 is up
[  520.685098] smpboot: Booting Node 0 Processor 3 APIC 0x6
[  520.779254]  cache: parent cpu3 should not be sleeping
[  520.943954] CPU3 is up
[  520.944064] smpboot: Booting Node 0 Processor 4 APIC 0x1
[  520.949092]  cache: parent cpu4 should not be sleeping
[  520.949302] CPU4 is up
[  520.949335] smpboot: Booting Node 0 Processor 5 APIC 0x3
[  521.032369]  cache: parent cpu5 should not be sleeping
[  521.228333] CPU5 is up
[  521.228367] smpboot: Booting Node 0 Processor 6 APIC 0x5
[  521.313982]  cache: parent cpu6 should not be sleeping
[  521.539392] CPU6 is up
[  521.539430] smpboot: Booting Node 0 Processor 7 APIC 0x7
[  521.627670]  cache: parent cpu7 should not be sleeping
[  521.881188] CPU7 is up
[  521.955200] ACPI: Waking up from system sleep state S3
[  522.066579] pcieport 0000:07:03.0: quirk: waiting for thunderbolt to reestablish PCI tunnels...
[  522.066584] pcieport 0000:07:06.0: quirk: waiting for thunderbolt to reestablish PCI tunnels...
[  522.066587] pcieport 0000:07:04.0: quirk: waiting for thunderbolt to reestablish PCI tunnels...
[  522.066588] pcieport 0000:07:05.0: quirk: waiting for thunderbolt to reestablish PCI tunnels...
[  522.086049] thunderbolt 0000:08:00.0: control channel starting...
[  522.086052] thunderbolt 0000:08:00.0: starting TX ring 0
[  522.086074] thunderbolt 0000:08:00.0: enabling interrupt at register 0x38200 bit 0 (0x0 -> 0x1)
[  522.086075] thunderbolt 0000:08:00.0: starting RX ring 0
[  522.086095] thunderbolt 0000:08:00.0: enabling interrupt at register 0x38200 bit 12 (0x1 -> 0x1001)
[  522.086099] thunderbolt 0000:08:00.0: resuming...
[  522.086101] thunderbolt 0000:08:00.0: resetting switch at 0
[  522.094326] thunderbolt 0000:08:00.0: 0: resuming switch
[  522.095148] thunderbolt 0000:08:00.0: resume finished
[  522.146573] ACPI: EC: event unblocked
[  522.156867] sd 1:0:0:0: [sda] Starting disk
[  522.187542] thunderbolt 0000:08:00.0: resetting error on 0:b.
[  522.187562] thunderbolt 0000:08:00.0: 0:b: hotplug: scanning
[  522.187563] thunderbolt 0000:08:00.0: 0:b: hotplug: no switch found
[  522.187670] thunderbolt 0000:08:00.0: resetting error on 0:c.
[  522.187687] thunderbolt 0000:08:00.0: 0:c: hotplug: scanning
[  522.187688] thunderbolt 0000:08:00.0: 0:c: hotplug: no switch found
[  522.473159] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  522.473445] ata1.00: unexpected _GTF length (8)
[  522.473869] ata1.00: unexpected _GTF length (8)
[  522.473961] ata1.00: configured for UDMA/133
[  525.653231] pciehp 0000:00:1c.0:pcie004: link training error: status 0x1001
[  525.653234] pciehp 0000:00:1c.0:pcie004: Failed to check link status
[  525.654073] OOM killer enabled.
[  525.654074] Restarting tasks ... done.
[  525.725513] video LNXVIDEO:00: Restoring backlight state
[  525.725547] PM: suspend exit

A working attempt logs (using nvidia-340):

[  305.193844] PM: suspend entry (deep)
[  305.193846] PM: Syncing filesystems ... done.
[  306.372254] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  306.373868] OOM killer disabled.
[  306.373869] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  306.375076] Suspending console(s) (use no_console_suspend to debug)
[  306.382722] ERROR @wl_notify_scan_status : 
[  306.382724] wlp3s0 Scan_results error (-22)
[  306.451873] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[  306.452535] sd 1:0:0:0: [sda] Stopping disk
[  306.455906] thunderbolt 0000:08:00.0: suspending...
[  306.458025] thunderbolt 0000:08:00.0: suspend finished
[  306.458027] thunderbolt 0000:08:00.0: stopping RX ring 0
[  306.458031] thunderbolt 0000:08:00.0: disabling interrupt at register 0x38200 bit 12 (0x1001 -> 0x1)
[  306.458038] thunderbolt 0000:08:00.0: stopping TX ring 0
[  306.458042] thunderbolt 0000:08:00.0: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
[  306.458049] thunderbolt 0000:08:00.0: control channel stopped
[  307.404274] ACPI: Preparing to enter system sleep state S3
[  307.468334] ACPI: EC: event blocked
[  307.468335] ACPI: EC: EC stopped
[  307.468336] PM: Saving platform NVS memory
[  307.468340] Disabling non-boot CPUs ...
[  307.486402] smpboot: CPU 1 is now offline
[  307.510612] smpboot: CPU 2 is now offline
[  307.551089] smpboot: CPU 3 is now offline
[  307.575401] smpboot: CPU 4 is now offline
[  307.594596] smpboot: CPU 5 is now offline
[  307.618809] smpboot: CPU 6 is now offline
[  307.649968] smpboot: CPU 7 is now offline
[  307.684138] ACPI: Low-level resume complete
[  307.684205] ACPI: EC: EC started
[  307.684206] PM: Restoring platform NVS memory
[  307.692402] Enabling non-boot CPUs ...
[  307.692520] x86: Booting SMP configuration:
[  307.692522] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  307.812138]  cache: parent cpu1 should not be sleeping
[  307.993154] CPU1 is up
[  307.993202] smpboot: Booting Node 0 Processor 2 APIC 0x4
[  308.096938]  cache: parent cpu2 should not be sleeping
[  308.268205] CPU2 is up
[  308.268267] smpboot: Booting Node 0 Processor 3 APIC 0x6
[  308.362847]  cache: parent cpu3 should not be sleeping
[  308.526992] CPU3 is up
[  308.527049] smpboot: Booting Node 0 Processor 4 APIC 0x1
[  308.532160]  cache: parent cpu4 should not be sleeping
[  308.532371] CPU4 is up
[  308.532405] smpboot: Booting Node 0 Processor 5 APIC 0x3
[  308.616315]  cache: parent cpu5 should not be sleeping
[  308.811871] CPU5 is up
[  308.811906] smpboot: Booting Node 0 Processor 6 APIC 0x5
[  308.894973]  cache: parent cpu6 should not be sleeping
[  309.115809] CPU6 is up
[  309.115845] smpboot: Booting Node 0 Processor 7 APIC 0x7
[  309.203587]  cache: parent cpu7 should not be sleeping
[  309.456945] CPU7 is up
[  309.530960] ACPI: Waking up from system sleep state S3
[  309.643015] pcieport 0000:07:03.0: quirk: waiting for thunderbolt to reestablish PCI tunnels...
[  309.643017] pcieport 0000:07:06.0: quirk: waiting for thunderbolt to reestablish PCI tunnels...
[  309.643019] pcieport 0000:07:04.0: quirk: waiting for thunderbolt to reestablish PCI tunnels...
[  309.643020] pcieport 0000:07:05.0: quirk: waiting for thunderbolt to reestablish PCI tunnels...
[  309.662424] thunderbolt 0000:08:00.0: control channel starting...
[  309.662430] thunderbolt 0000:08:00.0: starting TX ring 0
[  309.662448] thunderbolt 0000:08:00.0: enabling interrupt at register 0x38200 bit 0 (0x0 -> 0x1)
[  309.662450] thunderbolt 0000:08:00.0: starting RX ring 0
[  309.662470] thunderbolt 0000:08:00.0: enabling interrupt at register 0x38200 bit 12 (0x1 -> 0x1001)
[  309.662473] thunderbolt 0000:08:00.0: resuming...
[  309.662474] thunderbolt 0000:08:00.0: resetting switch at 0
[  309.669950] thunderbolt 0000:08:00.0: 0: resuming switch
[  309.670773] thunderbolt 0000:08:00.0: resume finished
[  309.723086] ACPI: EC: event unblocked
[  309.736281] sd 1:0:0:0: [sda] Starting disk
[  309.772535] thunderbolt 0000:08:00.0: resetting error on 0:b.
[  309.772546] thunderbolt 0000:08:00.0: resetting error on 0:c.
[  309.772867] thunderbolt 0000:08:00.0: 0:b: hotplug: scanning
[  309.772937] thunderbolt 0000:08:00.0: 0:b: hotplug: no switch found
[  309.772939] thunderbolt 0000:08:00.0: 0:c: hotplug: scanning
[  309.772940] thunderbolt 0000:08:00.0: 0:c: hotplug: no switch found
[  310.054313] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  310.054704] ata1.00: unexpected _GTF length (8)
[  310.055293] ata1.00: unexpected _GTF length (8)
[  310.055442] ata1.00: configured for UDMA/133
[  313.235957] pciehp 0000:00:1c.0:pcie004: link training error: status 0x1001
[  313.235961] pciehp 0000:00:1c.0:pcie004: Failed to check link status
[  313.240422] OOM killer enabled.
[  313.240423] Restarting tasks ... done.
[  313.300538] PM: suspend exit

The “no longer affline” lines are the only thing that jumps out to me:

[  520.015573] IRQ 18: no longer affine to CPU5

but I don’t know what they mean.

Any suggestions on how I can get a newer nvidia driver running would be appreciated as it seemed to run cooler with 390 than 340.

Does installing acpid help with that?

Hi generix, acpid is already installed, but I actually realised today that suspending does fail with all 3 drivers randomly (albeit slightly differently with nvidia v.s nouveau drivers).

The randomness has completely invalidated my debugging, I no longer think the 390 driver is the root cause of the problem (off to do more testing).

Ok, so I know this is now the wrong forum, but on the off chance someone’s search terms land them here I thought I’d post my solution anyway…

It turns out my laptop was simply being woken from sleep by the USB chip, similar to https://askubuntu.com/a/148482/139915 .

So running

egrep -q '^XHC1\s+S[0-9]\s+\*enabled' /proc/acpi/wakeup && echo XHC1 > /proc/acpi/wakeup

(your device may vary!) during boot puts the system in a state where suspend works fine with the latest nvidia driver for me.

The affline lines from dmesg are totally irrelevant to the problem.

Facing a similar issue. I tried @pauljohn 's workaround in
https://devtalk.nvidia.com/default/topic/1044633/linux/driver-does-not-wake-gpu-properly-after-suspend-ubuntu-18-10-with-branch-390-410-and-415-/post/5300650/?offset=8#reply
but I still face the same issue.

grep on the relevant details of my syslog gives me the following errors (was the same before I updated my grub file according to @pauljohn’s suggestions) (however, I am still using gdm3 as display manager)

Oct 6 15:44:50 nomitri-dl-laptop kernel: [ 3648.254341] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing
.
Oct 6 15:45:07 nomitri-dl-laptop kernel: [ 3665.047242] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c57d:0:0
Oct 6 15:45:09 nomitri-dl-laptop kernel: [ 3667.047261] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c57e:1:0
Oct 6 15:45:11 nomitri-dl-laptop kernel: [ 3669.047546] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c57d:0:0
Oct 6 15:45:13 nomitri-dl-laptop kernel: [ 3671.047563] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c57e:1:0
Oct 6 15:45:15 nomitri-dl-laptop kernel: [ 3673.047797] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c57d:0:0
Oct 6 15:45:17 nomitri-dl-laptop kernel: [ 3675.047810] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c57e:1:0
Oct 6 15:45:30 nomitri-dl-laptop kernel: [ 3688.509371] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Oct 6 15:45:33 nomitri-dl-laptop kernel: [ 3691.512220] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Oct 6 15:48:30 nomitri-dl-laptop kernel: [ 3867.774572] INFO: task nvidia-modeset/:697 blocked for more than 120 seconds.
Oct 6 15:48:30 nomitri-dl-laptop kernel: [ 3867.774575] nvidia-modeset/ D 0 697 2 0x80000000
Oct 6 15:48:30 nomitri-dl-laptop kernel: [ 3867.774608] nvkms_kthread_q_callback+0x65/0xe0 [nvidia_modeset]
Oct 6 15:48:30 nomitri-dl-laptop kernel: [ 3867.774612] _main_loop+0x76/0x140 [nvidia_modeset]
Oct 6 15:48:30 nomitri-dl-laptop kernel: [ 3867.774617] ? _raw_q_schedule+0x80/0x80 [nvidia_modeset]
Oct 6 15:50:30 nomitri-dl-laptop kernel: [ 3988.606189] INFO: task nvidia-modeset/:697 blocked for more than 120 seconds.
Oct 6 15:50:30 nomitri-dl-laptop kernel: [ 3988.606192] nvidia-modeset/ D 0 697 2 0x80000000
Oct 6 15:50:30 nomitri-dl-laptop kernel: [ 3988.606213] nvkms_kthread_q_callback+0x65/0xe0 [nvidia_modeset]
Oct 6 15:50:30 nomitri-dl-laptop kernel: [ 3988.606217] _main_loop+0x76/0x140 [nvidia_modeset]
Oct 6 15:50:30 nomitri-dl-laptop kernel: [ 3988.606224] ? _raw_q_schedule+0x80/0x80 [nvidia_modeset]

Any idea on how to fix this would be highly appreciated. This bug is getting really annoying.

I am running Ubuntu 18.04 on a Lenovo Legion Y740 with an NVIDIA RTX 2070 with CUDA 10.0 installed

$ uname -a
Linux nomitri-dl-laptop 5.0.0-31-generic #33~18.04.1-Ubuntu SMP Tue Oct 1 10:20:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi
Sun Oct 6 16:18:16 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |