I was tired of having nvidia-bug-report.sh hang on me even when running with --safe-mode, so I ran the script with strace to maybe find out why it hangs. And judging by the (incomplete) strace log file, the script will hang while trying to read /proc/driver/nvidia/./gpus/0000:01:00.0/power:
[pid 2028] openat(AT_FDCWD, "/proc/driver/nvidia/./gpus/0000:01:00.0/power", O_RDONLY) = 3
[pid 2028] fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid 2028] fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
[pid 2028] mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f57d0e64000
[pid 2028] read(3,
Hereās the command I used to capture the strace log (captured via SSH, because everything freezes and Iām unable to even switch to a TTY):
$ sudo strace -ff nvidia-bug-report.sh --safe-mode --extra-system-data 2>&1 | tee -a strace.log
And hereās the strace log itself: strace.log (614.8 KB)
And again, the driver crash happened while I was using Chromium, more specifically, watching a random Facebook video. This seems like the most random bug too, because I had literally just rebooted my computer, then I opened Chromium, watched the video for a minute and it crashed. So I tried to manually reproduce the crash again, repeating step-by-step, but I wasnāt able to!!!
The crash:
jan 17 05:20:08 arch kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
jan 17 05:20:08 arch kernel: #PF: supervisor read access in kernel mode
jan 17 05:20:08 arch kernel: #PF: error_code(0x0000) - not-present page
jan 17 05:20:08 arch kernel: PGD 800000012c756067 P4D 800000012c756067 PUD 0
jan 17 05:20:08 arch kernel: Oops: 0000 [#1] PREEMPT SMP PTI
jan 17 05:20:08 arch kernel: CPU: 2 PID: 215 Comm: irq/29-nvidia Tainted: P OE 5.10.7-arch1-1 #1
jan 17 05:20:08 arch kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B75M-DGS R2.0, BIOS P1.50 03/14/2018
jan 17 05:20:08 arch kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
jan 17 05:20:08 arch kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75>
jan 17 05:20:08 arch kernel: RSP: 0018:ffff9fddc359bc20 EFLAGS: 00010202
jan 17 05:20:08 arch kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
jan 17 05:20:08 arch kernel: RDX: ffff89f868588908 RSI: ffffffffffffffff RDI: 0000000000000020
jan 17 05:20:08 arch kernel: RBP: ffff89f8129f5990 R08: ffffffffc2152b60 R09: ffff89f8129f5970
jan 17 05:20:08 arch kernel: R10: ffff89f812974008 R11: ffff89f812975098 R12: 0000000000000020
jan 17 05:20:08 arch kernel: R13: 0000000000000000 R14: ffff89f8129f5af8 R15: ffff89f8129f5c00
jan 17 05:20:08 arch kernel: FS: 0000000000000000(0000) GS:ffff89f915d00000(0000) knlGS:0000000000000000
jan 17 05:20:08 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 17 05:20:08 arch kernel: CR2: 0000000000000020 CR3: 000000012b102004 CR4: 00000000001706e0
jan 17 05:20:08 arch kernel: Call Trace:
jan 17 05:20:08 arch kernel: ? _nv030766rm+0x1b/0x90 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv026432rm+0x18/0x60 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv012979rm+0x13d/0x1c0 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv000081rm+0x12f/0x1a0 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv012910rm+0xff/0x180 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv019531rm+0x1af/0x210 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv019482rm+0xdf3/0xef0 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv019449rm+0x78/0xd0 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv019463rm+0xcf/0x2f0 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv019497rm+0xbe/0xe0 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv028705rm+0x97b/0xdc0 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv028713rm+0x15d/0x400 [nvidia]
jan 17 05:20:08 arch kernel: ? _nv000709rm+0xa9/0x240 [nvidia]
jan 17 05:20:08 arch kernel: ? disable_irq_nosync+0x10/0x10
jan 17 05:20:08 arch kernel: ? rm_isr_bh+0x1c/0x60 [nvidia]
jan 17 05:20:08 arch kernel: ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
jan 17 05:20:08 arch kernel: ? irq_thread_fn+0x20/0x60
jan 17 05:20:08 arch kernel: ? irq_thread+0xf5/0x1a0
jan 17 05:20:08 arch kernel: ? irq_finalize_oneshot.part.0+0xe0/0xe0
jan 17 05:20:08 arch kernel: ? irq_thread_check_affinity+0xd0/0xd0
jan 17 05:20:08 arch kernel: ? kthread+0x133/0x150
jan 17 05:20:08 arch kernel: ? __kthread_bind_mask+0x60/0x60
jan 17 05:20:08 arch kernel: ? ret_from_fork+0x22/0x30
jan 17 05:20:08 arch kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep intel_rapl_msr intel_rapl_common snd_hda_c>
jan 17 05:20:08 arch kernel: xt_tcpudp xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6>
jan 17 05:20:08 arch kernel: CR2: 0000000000000020
jan 17 05:20:08 arch kernel: ---[ end trace 2771d77a04395ec1 ]---
jan 17 05:20:08 arch kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
jan 17 05:20:08 arch kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75>
jan 17 05:20:08 arch kernel: RSP: 0018:ffff9fddc359bc20 EFLAGS: 00010202
jan 17 05:20:08 arch kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
jan 17 05:20:08 arch kernel: RDX: ffff89f868588908 RSI: ffffffffffffffff RDI: 0000000000000020
jan 17 05:20:08 arch kernel: RBP: ffff89f8129f5990 R08: ffffffffc2152b60 R09: ffff89f8129f5970
jan 17 05:20:08 arch kernel: R10: ffff89f812974008 R11: ffff89f812975098 R12: 0000000000000020
jan 17 05:20:08 arch kernel: R13: 0000000000000000 R14: ffff89f8129f5af8 R15: ffff89f8129f5c00
jan 17 05:20:08 arch kernel: FS: 0000000000000000(0000) GS:ffff89f915d00000(0000) knlGS:0000000000000000
jan 17 05:20:08 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 17 05:20:08 arch kernel: CR2: 0000000000000020 CR3: 000000012b102004 CR4: 00000000001706e0
jan 17 05:20:08 arch kernel: BUG: kernel NULL pointer dereference, address: 0000000000000959
jan 17 05:20:08 arch kernel: #PF: supervisor write access in kernel mode
jan 17 05:20:08 arch kernel: #PF: error_code(0x0002) - not-present page
jan 17 05:20:08 arch kernel: PGD 800000012c756067 P4D 800000012c756067 PUD 0
jan 17 05:20:08 arch kernel: Oops: 0002 [#2] PREEMPT SMP PTI
jan 17 05:20:08 arch kernel: CPU: 2 PID: 215 Comm: irq/29-nvidia Tainted: P D OE 5.10.7-arch1-1 #1
jan 17 05:20:08 arch kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B75M-DGS R2.0, BIOS P1.50 03/14/2018
jan 17 05:20:08 arch kernel: RIP: 0010:mutex_lock+0x10/0x20
jan 17 05:20:08 arch kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 a1 fa ff ff 90 0f 1f 44 00 00 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb>
jan 17 05:20:08 arch kernel: RSP: 0018:ffff9fddc359be30 EFLAGS: 00010246
jan 17 05:20:08 arch kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
jan 17 05:20:08 arch kernel: RDX: ffff89f812e59ec0 RSI: 0000000000001b41 RDI: 0000000000000959
jan 17 05:20:08 arch kernel: RBP: 0000000000000959 R08: 0000000000000001 R09: 0000000000000000
jan 17 05:20:08 arch kernel: R10: ffff89f812a73c00 R11: 0000000000000000 R12: ffff89f812e5a6b4
jan 17 05:20:08 arch kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff89f812e59ec0
jan 17 05:20:08 arch kernel: FS: 0000000000000000(0000) GS:ffff89f915d00000(0000) knlGS:0000000000000000
jan 17 05:20:08 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 17 05:20:08 arch kernel: CR2: 0000000000000959 CR3: 000000012b102004 CR4: 00000000001706e0
jan 17 05:20:08 arch kernel: Call Trace:
jan 17 05:20:08 arch kernel: perf_event_exit_task+0x30/0x440
jan 17 05:20:08 arch kernel: ? kfree+0x40c/0x440
jan 17 05:20:08 arch kernel: do_exit+0x355/0xa40
jan 17 05:20:08 arch kernel: ? task_work_run+0x5c/0x90
jan 17 05:20:08 arch kernel: ? do_exit+0x345/0xa40
jan 17 05:20:08 arch kernel: ? kthread+0x133/0x150
jan 17 05:20:08 arch kernel: ? rewind_stack_do_exit+0x17/0x17
jan 17 05:20:08 arch kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep intel_rapl_msr intel_rapl_common snd_hda_c>
jan 17 05:20:08 arch kernel: xt_tcpudp xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6>
jan 17 05:20:08 arch kernel: CR2: 0000000000000959
jan 17 05:20:08 arch kernel: ---[ end trace 2771d77a04395ec2 ]---
jan 17 05:20:08 arch kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
jan 17 05:20:08 arch kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75>
jan 17 05:20:08 arch kernel: RSP: 0018:ffff9fddc359bc20 EFLAGS: 00010202
jan 17 05:20:08 arch kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
jan 17 05:20:08 arch kernel: RDX: ffff89f868588908 RSI: ffffffffffffffff RDI: 0000000000000020
jan 17 05:20:08 arch kernel: RBP: ffff89f8129f5990 R08: ffffffffc2152b60 R09: ffff89f8129f5970
jan 17 05:20:08 arch kernel: R10: ffff89f812974008 R11: ffff89f812975098 R12: 0000000000000020
jan 17 05:20:08 arch kernel: R13: 0000000000000000 R14: ffff89f8129f5af8 R15: ffff89f8129f5c00
jan 17 05:20:08 arch kernel: FS: 0000000000000000(0000) GS:ffff89f915d00000(0000) knlGS:0000000000000000
jan 17 05:20:08 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 17 05:20:08 arch kernel: CR2: 0000000000000959 CR3: 000000012b102004 CR4: 00000000001706e0
jan 17 05:20:08 arch kernel: Fixing recursive fault but reboot is needed!
My System Information
OS: Arch Linux
Kernel: Linux arch 5.10.7-arch1-1 #1 SMP PREEMPT Wed, 13 Jan 2021 12:02:01 +0000 x86_64 GNU/Linux
Kernel boot flags:
quiet splash loglevel=3 rd.systemd.show_status=auto rd.udev.log_priority=3 intel_pstate=passive nvidia-drm.modeset=1
GPU: NVIDIA GTX 660
Chromium: 87.0.4280.141
Desktop Environment: GNOME 3.38.3 (X11)
Window Manager: mutter 3.38.3
/etc/modprobe.d/nvidia.conf:
options nvidia NVreg_UsePageAttributeTable=1
MODULES in /etc/mkinitcpio.conf:
MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)
~/.config/chromium-flags.conf: chromium-flags.conf (2.3 KB)