Hi,
I’m having some issues when awaking from suspend to RAM which seems to be related with the NVIDIA card. The point is that suspend/resume works perfectly first but after repeating the suspend/resume operation the “Asynchronous wait on fence” error is fired on dmesg and the kscreenlocker_greet, Chromium and Firefox eats 100% of CPU. If I stop the SDDM and restart the Xorg server, then the Xorg also eats 100% of CPU.
It seems that the problem is reproduced after suspending and resuming 10 or 11 times. When the problem is reproduced I can see the mouse pointer with a blank screen and if I try to move the pointer it has a lot of lag. If I’m patient, I can also change to TTY1 (I can take up to 1 minute to change the TTY) in order to collect data.
That’s the dmesg output while resume:
[34344.779303] ACPI: Low-level resume complete
[34344.779368] ACPI: EC: EC started
[34344.779368] PM: Restoring platform NVS memory
[34344.780006] do_IRQ: 0.55 No irq handler for vector
[34344.780053] Enabling non-boot CPUs ...
[34344.780096] x86: Booting SMP configuration:
[34344.780096] smpboot: Booting Node 0 Processor 1 APIC 0x2
[34344.780919] CPU1 is up
[34344.780965] smpboot: Booting Node 0 Processor 2 APIC 0x1
[34344.781828] CPU2 is up
[34344.781865] smpboot: Booting Node 0 Processor 3 APIC 0x3
[34344.782716] CPU3 is up
[34344.785933] ACPI: Waking up from system sleep state S3
[34344.855515] ACPI: EC: interrupt unblocked
[34344.856492] nvidia 0000:01:00.0: Refused to change power state, currently in D3
[34345.004170] ACPI: EC: event unblocked
[34345.119880] nvme nvme0: 4/0/0 default/read/poll queues
[34345.137091] ath10k_pci 0000:04:00.0: failed to get board id: -95
[34345.205456] ath10k_pci 0000:04:00.0: unsupported HTC service id: 1536
[34345.281021] usb 1-6: reset high-speed USB device number 2 using xhci_hcd
[34345.523085] usb 1-8: reset full-speed USB device number 4 using xhci_hcd
[34345.779262] OOM killer enabled.
[34345.779263] Restarting tasks ...
[34345.784247] Bluetooth: hci0: using rampatch file: qca/rampatch_usb_00000302.bin
[34345.784249] Bluetooth: hci0: QCA: patch rome 0x302 build 0x3e8, firmware rome 0x302 build 0x111
[34345.795533] done.
[34345.809553] video LNXVIDEO:00: Restoring backlight state
[34345.809691] video LNXVIDEO:01: Restoring backlight state
[34345.810783] PM: suspend exit
[34345.810798] elogind-daemon[5638]: System resumed.
[34345.811265] elogind-daemon[5638]: Lid opened.
[34345.839517] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin
[34345.918820] psmouse serio1: synaptics: queried max coordinates: x [..5656], y [..4692]
[34345.948878] psmouse serio1: synaptics: queried min coordinates: x [1282..], y [1160..]
[34350.749636] wlp4s0: authenticate with d8:fb:5e:83:13:d3
[34350.786687] wlp4s0: send auth to d8:fb:5e:83:13:d3 (try 1/3)
[34350.787590] wlp4s0: authenticated
[34350.789642] wlp4s0: associate with d8:fb:5e:83:13:d3 (try 1/3)
[34350.792588] wlp4s0: RX AssocResp from d8:fb:5e:83:13:d3 (capab=0x511 status=0 aid=63)
[34350.794598] wlp4s0: associated
[34350.794670] ath: EEPROM regdomain: 0x82d4
[34350.794671] ath: EEPROM indicates we should expect a country code
[34350.794672] ath: doing EEPROM country->regdmn map search
[34350.794673] ath: country maps to regdmn code: 0x37
[34350.794674] ath: Country alpha2 being used: ES
[34350.794674] ath: Regpair used: 0x37
[34350.794675] ath: regdomain 0x82d4 dynamically updated by country element
[34350.806391] wlp4s0: Limiting TX power to 23 (23 - 0) dBm as advertised by d8:fb:5e:83:13:d3
[34356.894594] Asynchronous wait on fence NVIDIA:nvidia.prime:3c13e timed out (hint:intel_atomic_commit_ready+0x0/0x42 [i915])
Everything is normal except those log entries:
[34344.856492] nvidia 0000:01:00.0: Refused to change power state, currently in D3
(...)
[34356.894594] Asynchronous wait on fence NVIDIA:nvidia.prime:3c13e timed out (hint:intel_atomic_commit_ready+0x0/0x42 [i915])
The only error that I can see on Xorg.0.log is:
[ 34374.292] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x00004974, 0x00004a08)
[ 34381.292] (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x00004974, 0x00004a08)
[ 34384.293] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x00004974, 0x00004bc8)
[ 34391.293] (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x00004974, 0x00004bc8)
That’s the lspci -kkk -vvvv output for the NVIDIA device when the error is reproduced:
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)
Physical Slot: 1
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 136
Region 0: Memory at dc000000 (32-bit, non-prefetchable)
Region 1: Memory at b0000000 (64-bit, prefetchable)
Region 3: Memory at c0000000 (64-bit, prefetchable)
Region 5: I/O ports at e000
[virtual] Expansion ROM at dd000000 [disabled]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [258 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
The last time I reproduced the error I forgot to run nvidia-bug-report and I just collected the data that I shared here. The next time that the error is reproduced I’ll run nvidia-bug-report in order to share it here (sorry for that).
My configuration is: KD5 + Plasma + Nvidia.
My kernel version is: 5.2.7-gentoo.
NVIDIA driver version: 430.40.
Xorg server version: 1.20.5.
Plasma framework version: 5.60.0.
Plasma desktop version: 5.15.5.
Thank you for your time,
Regards.