[545.29.06-18.1]Flip event timeout error on startup, shutdown and sometimes suspend. Wayland unusable

Bug report:
nvidia-bug-report.log.gz (614.8 KB)

OS: openSUSE Tumbleweed
Kernel: 6.6.3-1-default
Nvidia driver version: 545.29.06-18.1
GPU: 1060 3GB

The problem starts at system startup, i get a really long startup, the splash screen appears glitched with artefacts, goes blank, and the screen then briefly shows alternating lines:

[drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
[drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1
[drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
[drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1

etc…

The system on X11 Gnome works then more or less normally, except it sometimes doesn’t wake up from suspend, or takes a really long time for the image to display when waking up.
Same thing happens, when suspending, takes a really long time, i get a glimpse of those lines, and the same thing at shutdown - takes really long time, blank screen, then those lines above appear, and after a while it eventually shuts down.

Wayland session in Gnome is unusable. If the artefacts don’t show up during the session, waking up from suspend glitches the entire screen with artefacts, blinking and stuttering cursors, and all sorts of glitches. It’s impossible to log in at that point.

Here’s the journalctl | grep nvidia output:
nvidia_journalctl.txt (28.8 KB)

Thanks for reading!

1 Like

I am having the same error

[drm:nv_drm_atomic_commit [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0

on Cachyos linux with my RTX 4070.

Here is the Nvidia bug report.
nvidia-bug-report.log.gz (1.3 MB)

Ah, i thought it might be a Pascal GPU thing, but if a 40 series has it, then maybe a driver or configuration issue?

EDIT: Ah, seems there’s more:

1 Like

Same thing here. Just updated Fedora 38 from 535.129.03 to 545.29.06, and both Kernel versions v6.6.3 and v6.5.10 start showing these error when I connect external displays to me Laptop:

kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: ACPI BIOS Error (bug): Could not resolve symbol [^^^PEG1.MASP], AE_NOT_FOUND (20230628/psargs-330)
kernel: ACPI Error: Aborting method \_SB.PC00.LPCB.EC._Q27 due to previous error (AE_NOT_FOUND) (20230628/psparse-529)
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: ACPI BIOS Error (bug): Could not resolve symbol [^^^PEG1.MASP], AE_NOT_FOUND (20230628/psargs-330)
kernel: ACPI Error: Aborting method \_SB.PC00.LPCB.EC._Q26 due to previous error (AE_NOT_FOUND) (20230628/psparse-529)
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0

Only using the internal display of the Laptop works fine, but as soon as I attach external displays it starts failing. After I downgraded the system to use 535.129.03 again, everything works as before (tested on v6.6.3).

Doesn’t even need Wayland and/or Xserver (both fail), but even just Linux TTY failes.

GPU: NVIDIA® T1200 4GB Quadro (Turing)

1 Like

The nvidia-drm fbdev option is experimental, so if you have issues with it set, just disable.

I don’t have that option set. I only have options nvidia-drm modeset=1 without fbdev - I think that option doesn’t even exist with the older drivers, so I never knew about it.

[    9.655729] fbcon: nvidia-drmdrmfb (fb0) is primary device

points to the option being set, likely by fedora packages. Please check using
grep fbdev /etc/modprobe.d/* /lib/modprobe.d/*

1 Like

Hey, thanks for responding!
I did check, i do have those options in /lib/modprobe.d/50-nvidia-default.conf

Should i delete the whole file, or just the fbdev part?
Or leave the file, and delete the contents?

What do those options do anyway?

EDIT: I also have dual monitors, if that’s relevant.

Don’t delete the file, just set fbdev=0 instead of 1, then recreate the initrd sudo dracut -f and reboot.
fbdev=1 is for having a hires, accelerated text console.

Thanks, i’ll try it and report back.
I kinda need the computer for a while now though, so later. :)
In case something weird happens.

It’s default enabled in fedora

https://pkgs.rpmfusion.org/cgit/nonfree/nvidia-kmod.git/tree/make_modeset_default.patch

add nvidia-drm.fbdev=0 to the cmdline

1 Like

Oh, really. nonfree/nvidia-kmod.git - nvidia-kmod

Ok, that would explain why it is active and might causes these issues when I have not enabled it. I’ll have to test that later, I downgraded the driver earlier to get my setup back to work. Thanks for the pointer, I was scratching my head about that.

Ok, i tried fbdev=0, the flip event timeout message seems to be gone, and the startup process was faster, as it should be.

I then tried the Wayland session, but it still wasn’t good, black bars flickering on screen in games, and some windows.

I tried X11 again, works ok but the refresh rate is set to 60Hz, not 75Hz like i set it initially.
Now, when i try to set the refresh rate to 75Hz, gnome session crashes, and tells me to log out and try again.
I also got a third display for some reason (i only have 2), that is set to 800x600, 60Hz, says 10" Unknown display, and is positioned to the most right.

I think i’m going to try to revert the driver to an earlier one… This isn’t working too good. I might have more luck with an earlier version.

Ok, just to confirm. At least for me, adding options nvidia-drm fbdev=0 to /etc/modprobe.d/nvidia.conf and then rebuilding my InitRD via dracut works. The notebook can handle my external screens again, and the error messages are gone (tested with Linux kernel 6.6.3-100.fc38.x86_64, NVIDIA drivers 545.29.06, and on X-Server). I also don’t have flickering issues Gordan described (at least so far).

Those only happen to me on Wayland (Gnome). X11 works fine.

A restart fixed the refresh rate crash issue as well. I still have a ghost monitor in display options, but i could change the refresh rate of the actual ones at least.

Also my system boots now normally to Wayland after setting fbdevto 0 and and running mkinitcpio -P

I have this too. Like everyone else mentioned already is to set fbdev=0.

I’ve had one or two occasions when fbdev=1 actually worked fine, I have no idea why, all I did was to reboot the system. (so it might be a race condition?)

Ok, so a quick update, the race condition idea got me thinking, so I decided to turn off the secondary monitor when booting… and voila! The timeout disappeared. Tried 3 reboots in a row. works like a charm.

So apparently you can have fbdev=1 if you turn off your secondary monitor when booting
Might not work for everyone but it did for me.

ping @2024a

I just debugged a problem with the fbdev code where the workqueue trying to handle a hotplug event gets deadlocked with an event that it’s waiting for that’s supposed to be processed on the same workqueue. It’s possible that having your second monitor attached was generating a hotplug at boot and triggering the deadlock.

Did your system eventually recover after the timeout events, or did you just keep getting more timeouts without it actually recovering? If you’re willing to install from a .run file I could send you a patch you could try to avoid the deadlock.

@aplattner I can give it a try. It does recover but it takes a while. Screen keeps switching modes etc.