CPU soft lockup(s) on 510.68.02, GTX 1070TI

Probably a kernel issue, since reverting the driver to 470 doesn’t change the behaviour.

Situation:
Two soft lockups on boot, consistent between reboots:

Jun 01 10:41:45 noah-pc kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [systemd-logind:897]
Jun 01 10:41:45 noah-pc kernel: Modules linked in: snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_rej>
Jun 01 10:41:45 noah-pc kernel:  i2c_smbus videodev snd_seq snd_seq_device mei_me joydev mc snd_pcm lpc_ich mei snd_timer parport_pc snd parport soundcore acpi_pad zram i915 crct10dif_pclmu>
Jun 01 10:41:45 noah-pc kernel: CPU: 0 PID: 897 Comm: systemd-logind Tainted: P           OE     5.17.11-300.fc36.x86_64 #1
Jun 01 10:41:45 noah-pc kernel: Hardware name: Gigabyte Technology Co., Ltd. Z97-HD3/Z97-HD3, BIOS F9 07/31/2015
Jun 01 10:41:45 noah-pc kernel: RIP: 0010:_nv039154rm+0x1c0/0x1e0 [nvidia]
Jun 01 10:41:45 noah-pc kernel: Code: ff ff 89 c7 e8 41 d9 ff ff 66 89 03 81 25 00 3b a9 01 80 f9 ff ff 5b 48 83 c5 10 c3 66 0f 1f 44 00 00 e8 23 dc ff ff 8b 7d 08 <48> 89 c3 e8 18 dc ff ff>
Jun 01 10:41:45 noah-pc kernel: RSP: 0018:ffffabd88168fa28 EFLAGS: 00000297
Jun 01 10:41:45 noah-pc kernel: RAX: ffffffffc3231f18 RBX: 00000000000017b6 RCX: 0000000000000000
Jun 01 10:41:45 noah-pc kernel: RDX: 0000000000000003 RSI: 00000000000c4358 RDI: 0000000000000006
Jun 01 10:41:45 noah-pc kernel: RBP: ffff892f8463abf0 R08: ffffffffc31c8340 R09: 0000000000000282
Jun 01 10:41:45 noah-pc kernel: R10: 0000000000000202 R11: 0000000000000001 R12: ffff892f8463ac68
Jun 01 10:41:45 noah-pc kernel: R13: ffff892f8463ac64 R14: 000000000000c000 R15: 000000000000c000
Jun 01 10:41:45 noah-pc kernel: FS:  00007fe9b5541bc0(0000) GS:ffff89329fa00000(0000) knlGS:0000000000000000
Jun 01 10:41:45 noah-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 01 10:41:45 noah-pc kernel: CR2: 00007f6fa7ffeb48 CR3: 0000000106fd4003 CR4: 00000000001706f0
Jun 01 10:41:45 noah-pc kernel: Call Trace:
Jun 01 10:41:45 noah-pc kernel:  <TASK>
Jun 01 10:41:45 noah-pc kernel:  ? _nv000753rm+0x4f/0x130 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv030048rm+0x14c/0x200 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv038086rm+0x2a/0x80 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv015812rm+0x9cc/0xad0 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv034293rm+0x174/0x180 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv035937rm+0x265/0x2c0 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv011415rm+0x4fe/0x620 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv034427rm+0x53/0xb0 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv000642kms+0x90/0x90 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? _nv010344rm+0x52/0xa0 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv010343rm+0x46/0x50 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv010343rm+0x2f/0x50 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? rm_kernel_rmapi_op+0x141/0x190 [nvidia]
Jun 01 10:41:45 noah-pc kernel:  ? _nv000642kms+0x90/0x90 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? nvkms_call_rm+0x3b/0x60 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? _nv002519kms+0x51/0x60 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? _nv002562kms+0x3e/0x90 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? _nv002338kms+0x1a2/0x1c0 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? _nv000647kms+0x30/0x60 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? _nv000642kms+0x66/0x90 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? nvkms_ioctl_from_kapi+0x47/0x80 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? _nv000643kms+0x3c/0x50 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? nv_drm_master_drop+0x2e/0x60 [nvidia_drm]
Jun 01 10:41:45 noah-pc kernel:  ? _nv000091kms+0x50/0x50 [nvidia_modeset]
Jun 01 10:41:45 noah-pc kernel:  ? drm_dropmaster_ioctl+0xaa/0x120
Jun 01 10:41:45 noah-pc kernel:  ? drm_setmaster_ioctl+0x160/0x160
Jun 01 10:41:45 noah-pc kernel:  ? drm_ioctl_kernel+0x9e/0x140
Jun 01 10:41:45 noah-pc kernel:  ? drm_ioctl+0x21c/0x410
Jun 01 10:41:45 noah-pc kernel:  ? drm_setmaster_ioctl+0x160/0x160
Jun 01 10:41:45 noah-pc kernel:  ? __seccomp_filter+0x28b/0x4c0
Jun 01 10:41:45 noah-pc kernel:  ? __x64_sys_ioctl+0x8d/0xc0
Jun 01 10:41:45 noah-pc kernel:  ? do_syscall_64+0x3a/0x80
Jun 01 10:41:45 noah-pc kernel:  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Jun 01 10:41:45 noah-pc kernel:  </TASK>

Soft lockup also happens when switching TTY. I was thinking about a HW defect, but using the nouveau driver works. The issue happens early on during boot, right after POST (screen gets stuck in POST screen before showing the login screen after ~1minute).

Other than the lockups, I don’t notice odd behaviour (but maybe there is, see nvidia report attached).

Edit: I should probably note that I’ve tried re-installing the driver multiple times (including re-downloading it). Install method is the package from RPMFusion, running system is Fedora 36. I didn’t have the issue on Fedora 35, but I also didn’t notice the issue until one week of using 36, so could still be unrelated.

nvidia-bug-report.log.gz (294.8 KB)