[545.29.06-18.1]Flip event timeout error on startup, shutdown and sometimes suspend. Wayland unusable

@amrits

I can reproduce this issue on every boot with the fbdev=1 option enabled. Here’s my bug report logs:
nvidia-bug-report.log.gz (759.2 KB)

Hi amrits,

On CachyOS we noticed that several user facing this issue, when fbdev=1 is used and mkinitcpio does not include the NVIDIA Modules as default.

Including these modules into /etc/mkinitcpio.conf does fix this issue for most people so far.

First we got reports from Ada users, but it appears that also users with 30xx cards face this issue, so we included in our hardware detection as default.

So likely there is some interaction if the “nvidia” modules don’t get loaded early, when fbdev=1 is used.
Maybe this helps you.

If you guys wouldn’t mind, could you please try uninstalling ddcutil? There does seem to be a race between DRM trying to set a console mode and other DRM clients opening /dev/dri/card0. I’m digging into that race condition now but it would be a useful experiment to see if disabling ddcutil avoids the problem.

@aplattner You might be onto something here, I moved ddcutil temporarily and rebooted/shutdown a few times without any issues.

Thanks for confirming. I have a patch that should fix the problem that I’m getting reviewed now.

1 Like

If you want to give it a try, please restore ddcutil and then try this patch:

bash NVIDIA-Linux-x86_64-550.54.14.run --apply-patch nvidia-drm-take-modeset-ownership-earlier.patch
sudo bash NVIDIA-Linux-x86_64-550.54.14-custom.run

nvidia-drm-take-modeset-ownership-earlier.patch (2.8 KB)

It sort of works, it still happens but it didn’t keep on “looping” and bringing the entire system to a crawl however there was a minor delay still. There was only 2 flip event messages this time.
UPDATE: It seems to change a bit from time to time, I got 4 flip event timeouts during another boot.

[sön mar  3 08:13:29 2024] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
[sön mar  3 08:13:32 2024] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1

Did you get any flip failure messages, or just the flip event timeouts?

Just messages like the ones above, causing the boot to be a bit slow.

Im not sure, if this patch fixed the issue many people facing here, since I dont have it on the 4070 super. Might come from the modules having in the initramfs included.

But ddcutil is still segfaulting for me:

dmesg:

[ 5.930498] ddcutil[618]: segfault at 0 ip 0000760ee139891c sp 00007ffc06dbd788 error 4 in libc.so.6[760ee1224000+18a000] likely on CPU 9 (core 9, socket 0)

journalctl:

Mär 03 17:15:36 cachyos-x8664 (udev-worker)[614]: hiddev3: Process '/usr/bin/ddcutil chkusbmon /dev/usb/hiddev3 -v' terminated by signal SEGV.
Mär 03 17:15:36 cachyos-x8664 (udev-worker)[614]: hiddev3: Failed to wait for spawned command '/usr/bin/ddcutil chkusbmon /dev/usb/hiddev3 -v': Input/output error

Edit:
but backlight is working without issues, maybe different issue

nvidia-bug-report.log.gz (3.2 MB)

Weird, you and I run basically the same stuff, except you have a 4070 Super and an AMD (?) CPU I have a regular 4070 and an Intel CPU. I also don’t currently have the modules loaded in initramfs because I needed to test if the flip event disappeared.
I ran:

journalctl |  grep ddcutil

No segfault, nothing in dmesg either.

It’s still causing havoc :(

[    6.177374] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
[    6.177558] Console: switching to colour dummy device 80x25
[    6.177577] nvidia 0000:01:00.0: vgaarb: deactivate vga console
[    6.507599] videodev: Linux video capture interface: v2.00
[    6.521081] usb 1-11.4: Found UVC 1.00 device <unnamed> (046d:0823)
[    6.558901] usbcore: registered new interface driver uvcvideo
[    6.879032] fbcon: nvidia-drmdrmfb (fb0) is primary device
[    6.946812] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
[    6.946840] Console: switching to colour frame buffer device 240x67
[    6.948642] nvidia 0000:01:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device
[    7.181478] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
[    9.405274] e1000e 0000:00:1f.6 eno2: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[    9.605961] userif-2: sent link up event.
[    9.816456] userif-2: sent link up event.
[   10.034338] userif-2: sent link up event.
[   10.241545] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
[   13.249527] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1
[   13.249756] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
[   16.257533] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
[   19.265532] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1
[   19.265790] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22

Interestingly enough, KDE 6 Wayland now works without fbdev=1. If I use fbdev=1, it will still exhibit the flip event timeout errors though.

I’m encountering the same flip timeout errors using hyprland (and previously on sway) with an RTX 3080. I’ve been engaging in a hyprland thread here:

The interesting thing from the logs that I’ve seen is that according to libdrm, the ATOMIC_NONBLOCK flag is set, and looking at the open source driver (which I’m not using, but I don’t have access to the closed source driver’s source code), the “Flip event timeout” message should only be reported if ATOMIC_NONBLOCK is NOT set. Nothing in the code I traced through in wlroots, libdrm, the kernel drm, or the open source driver would explain this flag being wrong

Hi @aplattner ,

We on CachyOS recently released a new ISO, which defaults now to Plasma Wayland.
Some users expierencing issues, (mainly on 40xx cards) that the graphical session is not correctly loaded.

We are using fbdev=1 and drm.modeset, we can confirm that this issue is fixed when patching the nvidia module with this above patch.
Please consider to pull this into the next (beta) release.

Thanks!

I added nvidia nvidia_modeset nvidia_uvm nvidia_drm to the MODULES line in mkinitcpio.conf on my Arch machine and now fbdev option works! It no longer just freezes on the boot console and now shows a high resolution TTY.

Why is this necessary for fbdev though?

EDIT: I also had to move modeset=1 and fbdev=1 out of my modprobe.d and into the bootloader config as nvidia_drm.modeset=1 and nvidia_drm.fbdev=1, otherwise fbdev would not take into effect.

It seems that something else takes the ownership before fbdev/modeset itself somehow.

I wonder why this didnt got into 550.67.

@aplattner
Is it planned to provide this in a future release?
Fedora is also using fbdev=1 as default and as soon this is solved, I would also bring it on archlinux as default forward.

1 Like

im not entirerly sure if im hitting the same issue but i tried both fbdev=1 and 0, setting it on bootloader and only in modprobed.d/nv.conf , ive also tried applying nvidia-drm-take-modeset-ownership-earlier.patch ontop of 550.67 but if i have hdmi plugged in while booting and plasma tries to start it freezes and drops this in dmesg

mar 29 13:19:33 tom-legion kernel: BUG: unable to handle page fault for address: ffffb50f81f97d88
mar 29 13:19:33 tom-legion kernel: #PF: supervisor read access in kernel mode
mar 29 13:19:33 tom-legion kernel: #PF: error_code(0x0000) - not-present page
mar 29 13:19:33 tom-legion kernel: PGD 100000067 P4D 100000067 PUD 10026e067 PMD 10620d067 PTE 0
mar 29 13:19:33 tom-legion kernel: Oops: 0000 [#2] PREEMPT SMP
mar 29 13:19:33 tom-legion kernel: CPU: 2 PID: 1066 Comm: systemd-logind Tainted: P S   UD    O       6.8.2-gentoo-tom #1
mar 29 13:19:33 tom-legion kernel: Hardware name: LENOVO 82WQ/LNVNB161216, BIOS KWCN42WW 09/15/2023
mar 29 13:19:33 tom-legion kernel: RIP: 0010:_nv035940rm+0x1ac/0x480 [nvidia]
mar 29 13:19:33 tom-legion kernel: Code: 48 63 47 08 48 01 c2 48 8b 07 48 85 c0 75 1b e9 2b 02 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 48 10 48 85 c9 74 17 48 89 c8 <48> 39 30 77 ef 0f 83 f9 01 00 00 48 8b 48 18 48 85 c9 75 e9 48 89
mar 29 13:19:33 tom-legion kernel: RSP: 0018:ffffb50f81e674c8 EFLAGS: 00010086
mar 29 13:19:33 tom-legion kernel: RAX: ffffb50f81f97d88 RBX: ffffffffc11c4e6b RCX: 0000000225c17d03
mar 29 13:19:33 tom-legion kernel: RDX: ffff9fbee20d5f20 RSI: 000000000000042a RDI: ffffffffc1eae5b8
mar 29 13:19:33 tom-legion kernel: RBP: ffff9fbee20d5eb0 R08: 0000000000000000 R09: ffff9fbee20d5f48
mar 29 13:19:33 tom-legion kernel: R10: 00000000c1d00001 R11: ffffb50f81675008 R12: ffff9fbee20d5ed0
mar 29 13:19:33 tom-legion kernel: R13: 00000000c3720101 R14: ffff9fbeccbf9008 R15: 00000000c1d00001
mar 29 13:19:33 tom-legion kernel: FS:  00007ffff76ea8c0(0000) GS:ffff9fd60d480000(0000) knlGS:0000000000000000
mar 29 13:19:33 tom-legion kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
mar 29 13:19:33 tom-legion kernel: CR2: ffffb50f81f97d88 CR3: 0000000111901000 CR4: 0000000000750ef0
mar 29 13:19:33 tom-legion kernel: PKRU: 55555554
mar 29 13:19:33 tom-legion kernel: Call Trace:
mar 29 13:19:33 tom-legion kernel:  <TASK>
mar 29 13:19:33 tom-legion kernel:  ? __die+0x1f/0x60
mar 29 13:19:33 tom-legion kernel:  ? page_fault_oops+0x161/0x4b0
mar 29 13:19:33 tom-legion kernel:  ? fixup_exception+0x22/0x270
mar 29 13:19:33 tom-legion kernel:  ? exc_page_fault+0x42f/0x780
mar 29 13:19:33 tom-legion kernel:  ? asm_exc_page_fault+0x22/0x30
mar 29 13:19:33 tom-legion kernel:  ? _nv045234rm+0x4b/0x1a0 [nvidia]
mar 29 13:19:33 tom-legion kernel:  ? _nv035940rm+0x1ac/0x480 [nvidia]
mar 29 13:19:33 tom-legion kernel:  ? _nv045234rm+0x4b/0x1a0 [nvidia]
mar 29 13:19:33 tom-legion kernel:  ? _nv049667rm+0xd6/0x1d0 [nvidia]
mar 29 13:19:33 tom-legion kernel:  ? _nv045234rm+0x4b/0x1a0 [nvidia]
mar 29 13:19:33 tom-legion kernel:  ? _nv023020rm+0x1e16/0x22e0 [nvidia]
mar 29 13:19:33 tom-legion kernel:  ? _nv000582rm+0x5e/0x70 [nvidia]
mar 29 13:19:33 tom-legion kernel:  ? rm_kernel_rmapi_op+0x127/0x220 [nvidia]
mar 29 13:19:33 tom-legion kernel:  ? nvkms_call_rm+0x2f/0x40 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? _nv002792kms+0x42/0x50 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? _nv002479kms+0xf0/0x850 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? psi_task_change+0x81/0xa0
mar 29 13:19:33 tom-legion kernel:  ? _nv002892kms+0x119/0x270 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? _nv002893kms+0x59/0x160 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? _nv002852kms+0x19c5/0x4a40 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? nv_kthread_q_stop+0x1340/0x4ae0 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? nvKmsIoctl+0xf9/0x270 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? nvkms_ioctl_from_kapi+0x70/0xd0 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? _nv002234kms+0x1cce/0x3310 [nvidia_modeset]
mar 29 13:19:33 tom-legion kernel:  ? nv_drm_internal_framebuffer_create+0x840/0x8f0 [nvidia_drm]
mar 29 13:19:33 tom-legion kernel:  ? drm_atomic_add_encoder_bridges+0x45/0xa0
mar 29 13:19:33 tom-legion kernel:  ? drm_mode_to_nvkms_display_mode+0x12/0x1740 [nvidia_drm]
mar 29 13:19:33 tom-legion kernel:  ? drm_atomic_check_only+0x5b6/0xa20
mar 29 13:19:33 tom-legion kernel:  ? drm_atomic_commit+0x46/0xa0
mar 29 13:19:33 tom-legion kernel:  ? __drm_printfn_seq_file+0x20/0x20
mar 29 13:19:33 tom-legion kernel:  ? nv_drm_atomic_helper_disable_all+0x1ec/0x4b0 [nvidia_drm]
mar 29 13:19:33 tom-legion kernel:  ? nv_drm_exit+0x1204/0x17e0 [nvidia_drm]
mar 29 13:19:33 tom-legion kernel:  ? drm_dropmaster_ioctl+0xad/0x130
mar 29 13:19:33 tom-legion kernel:  ? drm_setmaster_ioctl+0x160/0x160
mar 29 13:19:33 tom-legion kernel:  ? drm_ioctl_kernel+0x86/0xe0
mar 29 13:19:33 tom-legion kernel:  ? drm_ioctl+0x231/0x460
mar 29 13:19:33 tom-legion kernel:  ? drm_setmaster_ioctl+0x160/0x160
mar 29 13:19:33 tom-legion kernel:  ? __x64_sys_ioctl+0x8a/0xb0
mar 29 13:19:33 tom-legion kernel:  ? do_syscall_64+0x80/0x190
mar 29 13:19:33 tom-legion kernel:  ? do_syscall_64+0x8c/0x190
mar 29 13:19:33 tom-legion kernel:  ? syscall_exit_to_user_mode+0x73/0x180
mar 29 13:19:33 tom-legion kernel:  ? do_syscall_64+0x8c/0x190
mar 29 13:19:33 tom-legion kernel:  ? do_syscall_64+0x8c/0x190
mar 29 13:19:33 tom-legion kernel:  ? entry_SYSCALL_64_after_hwframe+0x46/0x4e
mar 29 13:19:33 tom-legion kernel:  </TASK>

it works however if i leave hdmi unplugged until plasma is up and then plug it in.

I can still reproduce all the time in Ubuntu 24.04 with nvidia 550.90.07 from the ubuntu package and fbdev=1.
initcall_blacklist=simpledrm_platform_driver_init makes no difference.

What seems to fix it here is delaying the startup of gdm with

Wants=systemd-udev-settle.service
After=systemd-udev-settle.service

Instead i never hit this issue in Fedora, presumably because the boot sequence takes longer on my install…?

Try using these cmdline options instead

nvidia-drm.modeset=1 nvidia-drm.fbdev=1