Random hard freezes Ubuntu with 450/460

These hard freezes are worse on 460. Stepping back to 450 has brought them from nightly to about once a week.

Symptom:

I lock my computer when not in use and the monitors turn off. It does not hibernate, etc. Every night under 460 and still about once a week under 450 it will freeze up hard. The numlock key light doesn’t respond, and the monitors won’t wake up. Syslog continues to log.

The first thing that will show up is repeated detection of the monitors over and over and over as such:

Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: Internal DisplayPort
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: 1440.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-0: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-1: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-1: Internal TMDS
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-1: 165.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: Internal DisplayPort
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: 1440.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-3: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-3: Internal TMDS
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-3: 165.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): Acer V277U (DFP-4): connected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): Acer V277U (DFP-4): Internal DisplayPort
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): Acer V277U (DFP-4): 1440.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-5: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-5: Internal TMDS
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-5: 165.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): Acer V277U (DFP-6): connected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): Acer V277U (DFP-6): Internal DisplayPort
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): Acer V277U (DFP-6): 1440.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-7: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-7: Internal TMDS
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-7: 165.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (II) NVIDIA(0): Setting mode "DP-2: nvidia-auto-select @2560x1440 +2560+0 {ViewPortIn=2560x1440, ViewPortOut=2560x1440+0+0}"
Feb 22 00:42:32 tony-linux kernel: [41283.784569] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 86
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-4: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-4: Internal DisplayPort
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-4: 1440.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-6: disconnected
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-6: Internal DisplayPort
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-6: 1440.0 MHz maximum pixel clock
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:32 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (II) NVIDIA(0): Setting mode "NULL"
Feb 22 00:42:33 tony-linux kernel: [41283.916563] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 0
Feb 22 00:42:33 tony-linux gnome-shell[2660]: Window manager warning: Configuring CRTC 442 with mode 448 (2560 x 1440 @ 74.893105) at position 0, 0 and transform 0 failed
Feb 22 00:42:33 tony-linux gnome-shell[2660]: Window manager warning: Configuring CRTC 443 with mode 448 (2560 x 1440 @ 74.893105) at position 2560, 0 and transform 0 failed
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-0: disconnected
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-1: disconnected
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-1: Internal TMDS
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-1: 165.0 MHz maximum pixel clock
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: disconnected
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: Internal DisplayPort
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-2: 1440.0 MHz maximum pixel clock
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0):
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-3: disconnected
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-3: Internal TMDS
Feb 22 00:42:33 tony-linux /usr/lib/gdm3/gdm-x-session[2534]: (--) NVIDIA(GPU-0): DFP-3: 165.0 MHz maximum pixel clock

After 1 to 6 hours of this, it logs a ‘soft lockup’ and then crashes the nvidia driver:

Feb 22 06:13:29 tony-linux kernel: [61140.165565] watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [Xorg:2534]

Feb 22 06:13:29 tony-linux kernel: [61140.165568] Modules linked in: sctp wireguard(E) ip6_udp_tunnel udp_tunnel xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter vboxnetadp(OE) vboxnetflt(OE) aufs vboxdrv(OE) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_usb_audio snd_usbmidi_lib videobuf2_common snd_hda_codec_hdmi nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic nvidia_uvm(OE) ledtrig_audio snd_hda_intel nvidia_drm(POE) edac_mce_amd nvidia_modeset(POE) snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_amd kvm nvidia(POE) snd_seq crct10dif_pclmul snd_seq_device ghash_clmulni_intel snd_timer aesni_intel eeepc_wmi crypto_simd asus_wmi cryptd sparse_keymap input_leds glue_helper video wmi_bmof snd drm_kms_helper ccp k10temp fb_sys_fops syscopyarea soundcore sysfillrect sysimgblt mac_hid sch_fq_codel v4l2loopback(OE) videodev
Feb 22 06:13:29 tony-linux kernel: [61140.165592]  mc overlay iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables nct6775 hwmon_vid parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul ahci r8169 libahci i2c_piix4 realtek wmi
Feb 22 06:13:29 tony-linux kernel: [61140.165601] CPU: 11 PID: 2534 Comm: Xorg Tainted: P           OE     5.4.0-65-generic #73-Ubuntu
Feb 22 06:13:29 tony-linux kernel: [61140.165602] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 3001 12/04/2020
Feb 22 06:13:29 tony-linux kernel: [61140.165606] RIP: 0010:__x86_indirect_thunk_rax+0x3/0x20
Feb 22 06:13:29 tony-linux kernel: [61140.165608] Code: c8 e9 4a 95 cc ff c1 e1 03 01 d1 89 ca e9 e5 9b cc ff 48 8d 0c c8 e9 cc 99 cc ff 90 90 90 90 90 90 90 90 90 90 90 90 0f ae e8 <ff> e0 0f 1f 84 00 00 00 00 00 0f 1f 40 00 66 66 2e 0f 1f 84 00 00
Feb 22 06:13:29 tony-linux kernel: [61140.165608] RSP: 0018:ffffb98505a1b888 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff13
Feb 22 06:13:29 tony-linux kernel: [61140.165610] RAX: ffffffffc1c56a40 RBX: ffff9af0fb634208 RCX: 0000000002389c2d
Feb 22 06:13:29 tony-linux kernel: [61140.165610] RDX: ffff9af05ba57008 RSI: ffff9af014ad7408 RDI: ffff9af0fb634208
Feb 22 06:13:29 tony-linux kernel: [61140.165610] RBP: ffff9af181591808 R08: 0000000000000000 R09: ffffffffc1750bc0
Feb 22 06:13:29 tony-linux kernel: [61140.165611] R10: ffff9af12f368098 R11: ffff9af18e82aef8 R12: 0000000000000000
Feb 22 06:13:29 tony-linux kernel: [61140.165611] R13: 0000000000000000 R14: ffff9af014ad7408 R15: 0000000000000001
Feb 22 06:13:29 tony-linux kernel: [61140.165612] FS:  00007fb52f4b5a40(0000) GS:ffff9af18eac0000(0000) knlGS:0000000000000000
Feb 22 06:13:29 tony-linux kernel: [61140.165612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 22 06:13:29 tony-linux kernel: [61140.165612] CR2: 00007fe139f28000 CR3: 0000000755bd8000 CR4: 0000000000340ee0
Feb 22 06:13:29 tony-linux kernel: [61140.165613] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 22 06:13:29 tony-linux kernel: [61140.165613] DR3: 0000000000000000 DR6: 00000000ffff0ff1 DR7: 0000000000000400
Feb 22 06:13:29 tony-linux kernel: [61140.165614] Call Trace:
Feb 22 06:13:29 tony-linux kernel: [61140.165631]  ? _nv001124kms+0xc7/0x400 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.165640]  ? _nv000732kms+0x1e/0x80 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.165647]  ? _nv002396kms+0x112/0x130 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.165655]  ? _nv000515kms+0xd1/0xe1 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.165662]  ? _nv000019kms+0x231/0x723 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.165752]  ? os_free_mem+0x22/0x30 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.165853]  ? _nv008505rm+0xbe/0x100 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.165962]  ? _nv035047rm+0x2a/0x60 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166055]  ? _nv030393rm+0x23/0x40 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166162]  ? _nv033630rm+0x58/0xf0 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166266]  ? _nv008136rm+0x33c/0x3f0 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166335]  ? os_acquire_spinlock+0x12/0x20 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166401]  ? os_release_spinlock+0x1a/0x20 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166470]  ? _nv037029rm+0xa1/0x190 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166574]  ? _nv033624rm+0x67/0x100 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166582]  ? _nv002760kms+0x12a0/0x1470 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.166589]  ? _nv000531kms+0x50/0x50 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.166595]  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.166601]  ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.166607]  ? nvkms_ioctl+0xc4/0x100 [nvidia_modeset]
Feb 22 06:13:29 tony-linux kernel: [61140.166673]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
Feb 22 06:13:29 tony-linux kernel: [61140.166676]  ? do_vfs_ioctl+0x407/0x670
Feb 22 06:13:29 tony-linux kernel: [61140.166676]  ? ksys_ioctl+0x67/0x90
Feb 22 06:13:29 tony-linux kernel: [61140.166677]  ? __x64_sys_ioctl+0x1a/0x20
Feb 22 06:13:29 tony-linux kernel: [61140.166679]  ? do_syscall_64+0x57/0x190
Feb 22 06:13:29 tony-linux kernel: [61140.166681]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

From there, until I hard boot the machine, it logs that crash about three times per minute.

nvidia-bug-report.log.gz (339.1 KB)

Looks like the monitors are randomly disabling their displayport, tried turning off “auto source”?

On the monitors? Yes, all three are set to Displayport (DP) input manually. I’ve also tried telling Xorg exactly what DP port goes to which monitor because the other symptom is the random swapping of 1 and 2 when waking up. I actually wrote a bash script to force the order back correctly and put it in a desktop icon so it’s easy to fix it happens so much.