Hi everyone.
Previously, I used the Nvidia 535.183.01 DKMS driver for a year without any issues. I used it with different kernels from 6.7.1 to 6.11.3. After updating to Linux 6.13.1, I found that the 535 DKMS driver didn’t work anymore, so I replaced it with a regular 570.86.16 driver. Hopefully, 570 has issues that forced me to stay on DKSM fixed, But there is a new critical issue.
There’s a 50% chance of a driver crash after 20-30 minutes of inactivity. Here are some fragments from the logs:
[ 9385.351175] NVRM: GPU at PCI:0000:01:00: GPU-b67c507a-2a6c-d7a3-0569-0afb634dcb7d
[ 9385.351179] NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
[ 9385.351185] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[ 9385.351398] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[ 9385.351605] NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
[ 9387.214173] traps: slack[2016] trap int3 ip:61bfc8edc7fe sp:7ffed3303db0 error:0 in slack[60537fe,61bfc4f31000+8996000]
[ 9387.591474] NVRM: Error in service of callback
[ 9392.500349] nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
Specs:
Laptop: Lenovo Legion Pro 7 16IRX8
GPU: RTX 4070 Laptop (AD106-B)
OS: Arch Linux
Kernel: 6.13.1l
Nvidia driver: 570.86.16
Display server: Xorg 21.1.15
Display manager: SDDM 0.21.0
nvidia-bug-report.log.gz (206.5 KB)