Hi,
I recently installed driver version 535.54.03 on Ubuntu 20.04 (kernel 5.4.0-153-generic). Since then, my system occasionally softlocks when closing Vulkan apps (i.e., the whole system becomes unresponsive). This is not consistently reproducible and happens quite rarely, but still regularly enough to be annoying to deal with.
According to the call trace in kern.log below, the softlock happens somewhere in the NVIDIA driver.
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553730] watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [Correrender:61157]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553732] Modules linked in: dm_crypt rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp vboxnetadp(OE) vboxnetflt(OE) ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc vboxdrv(OE) aufs cmac algif_hash algif_skcipher af_alg bnep overlay nvidia_uvm(OE) nvidia_drm(POE) nvidia_modeset(POE) snd_hda_codec_hdmi nvidia(POE) nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio iwlmvm mac80211 snd_hda_intel uvcvideo snd_intel_dspcfg snd_hda_codec libarc4 videobuf2_vmalloc snd_hda_core edac_mce_amd videobuf2_memops snd_usb_audio videobuf2_v4l2 videobuf2_common kvm_amd snd_seq_midi videodev snd_usbmidi_lib joydev kvm snd_hwdep snd_seq_midi_event mc btusb snd_rawmidi btrtl input_leds btbcm crct10dif_pclmul btintel ghash_clmulni_intel binfmt_misc snd_pcm snd_seq bluetooth iwlwifi snd_seq_device snd_timer
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553757] aesni_intel ecdh_generic crypto_simd drm_kms_helper snd ecc cryptd fb_sys_fops cfg80211 wmi_bmof k10temp soundcore ccp glue_helper syscopyarea sysfillrect sysimgblt mac_hid sch_fq_codel msr parport_pc ppdev lp parport ramoops drm reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_logitech_hidpp hid_logitech_dj hid_generic igb usbhid uas usb_storage hid crc32_pclmul i2c_algo_bit i2c_piix4 nvme dca ahci nvme_core libahci wmi gpio_amdpt gpio_generic
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553777] CPU: 6 PID: 61157 Comm: Correrender Tainted: P OE 5.4.0-153-generic #170-Ubuntu
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553778] Hardware name: Micro-Star International Co., Ltd. MS-7B85/B450 GAMING PRO CARBON AC (MS-7B85), BIOS 1.F6 09/30/2021
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553911] RIP: 0010:_nv039537rm+0x3b/0x80 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553912] Code: d3 89 de 48 8d 55 0f c6 45 0f 00 e8 3f 4c 60 ff 80 7d 0f 00 41 89 c4 75 11 41 39 5d 10 76 20 49 8b 45 00 c1 eb 02 44 8b 24 98 <5b> 44 89 e0 41 5c 41 5d 48 83 c5 10 c3 0f 1f 84 00 00 00 00 00 be
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553913] RSP: 0018:ffffb8a6813b78c0 EFLAGS: 00200216 ORIG_RAX: ffffffffffffff13
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553914] RAX: ffffb8a691000000 RBX: 00000000002e0405 RCX: 0000000000b81014
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553914] RDX: ffff9c042b7f289f RSI: 0000000000b81014 RDI: ffff9c09d9db8008
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553915] RBP: ffff9c042b7f2890 R08: 0000000000000020 R09: 0000000000000000
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553915] R10: 0000000000b81014 R11: ffff9c042b7f29c8 R12: 0000000000000002
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553915] R13: ffff9c09d9db8bc8 R14: 0000000000000000 R15: 0000000000000000
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553916] FS: 00007fffa90c9000(0000) GS:ffff9c09fe980000(0000) knlGS:0000000000000000
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553917] CR2: 00007ff768000010 CR3: 0000000e8ed0c000 CR4: 0000000000340ee0
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.553917] Call Trace:
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.554089] ? _nv013076rm+0x10f/0x170 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.554258] ? _nv030427rm+0xb8/0xe0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.554426] ? _nv030452rm+0xa0/0x2d0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.554593] ? _nv030453rm+0x5b/0x1d0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.554758] ? _nv030454rm+0x2d/0x110 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.554924] ? _nv030546rm+0x13f/0x340 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.555089] ? _nv030547rm+0x50/0x60 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.555245] ? _nv013174rm+0x86/0xc0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.555400] ? _nv013170rm+0x3a4/0x400 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.555553] ? _nv044237rm+0xd1/0x1b0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.555720] ? _nv041109rm+0x1e7/0x370 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.555819] ? _nv048377rm+0x40/0x95 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.555972] ? _nv035020rm+0x14d/0x2e0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.556071] ? _nv048374rm+0xc5/0x460 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.556196] ? _nv002711rm+0xd/0x20 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.556320] ? _nv004074rm+0x19/0xb0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.556446] ? _nv016053rm+0x51c/0x620 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.556546] ? _nv043216rm+0xab/0xe0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.556671] ? _nv044933rm+0xac/0x130 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.556795] ? _nv044932rm+0x3e5/0x690 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.556896] ? _nv043119rm+0xd5/0x160 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557004] ? _nv043120rm+0x41/0x70 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557105] ? _nv000566rm+0x4d/0x60 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557222] ? _nv000714rm+0x1b7/0xe70 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557337] ? rm_ioctl+0x58/0xb0 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557426] ? nvidia_ioctl+0x6f0/0x850 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557429] ? get_max_files+0x20/0x20
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557518] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557520] ? do_vfs_ioctl+0x407/0x670
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557521] ? ksys_ioctl+0x67/0x90
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557522] ? __x64_sys_ioctl+0x1a/0x20
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557524] ? do_syscall_64+0x57/0x190
Jul 5 15:10:12 christoph-MS-7B85 kernel: [21184.557526] ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1
I have also included nvidia-bug-report.log.gz after restarting the system after the last crash, but it does not seem to contain useful information. The name ‘Correrender’, which appears in kern.log, is the name of the Vulkan application.
nvidia-bug-report.log.gz (385.3 KB)