I have just noticed this on kernel 5.14.1 on Ubuntu 21.04 as well. My guess is that if you install the mainline kernel 5.14.16 using the Ubuntu mainline GUI kernel installation tool (which installs kernels from the Ubuntu mainline kernel websote), then this might go away for you.
Unfortunately, the dependency for this kernel is libc6 >= 2.34, which Ubuntu 21.10 has, but Ubuntu 21.04 is stuck on 2.33. I have a laptop on 21.10, I might check this tomorrow.
Hi there, I checked on 5.14.16 kernel, and this issue does not exist on my Dell XPS with a GTX 1050 Ti Mobile on Ubuntu 21.10, so you can try this new kernel if you feel comfortable. Good luck.
Hi, @berglh One comment: you pasted first few seconds of boot log. In case of “user28546” the bug appeared at second 360. In my case is atso never occurs at the very beginning.
One think that I can surely say that for me it occurs if and only if monitor (monitors, I use 3, not sure if that is relevant) is being woken for sleep. I will try newer kernel, just saying that no errors in few first seconds of boor log are not indicative of whether this issue is there or not.
I was running 495.44 on my desktop machine on Ubuntu 21.04 and didn’t have this issue. I’ve just noticed it after upgrading my desktop machine with 2080 Super and 5.14.17 kernel. I’m not sure why I didn’t see this issue on my laptop, maybe it’s also related the type of GPU as well? [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000600] Failed to grab modeset ownership
@berglh One comment: you pasted first few seconds of boot log. In case of “user28546” the bug appeared at second 360. In my case is atso never occurs at the very beginning.
I left the machine running for quite some time then grepped for items relating to nvidia and drm. That’s why it was only showing entries from the boot time.
I’m using nvidia-drivers 495.44 on Linux 5.15.1. As soon as I start X, I get the same error:
[ 5.792304] nvidia: loading out-of-tree module taints kernel.
[ 5.792316] nvidia: module license 'NVIDIA' taints kernel.
[ 5.792316] Disabling lock debugging due to kernel taint
[ 5.808903] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
[ 5.809614] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 5.925348] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 495.44 Fri Oct 22 06:13:12 UTC 2021
[ 5.928680] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 495.44 Fri Oct 22 06:05:22 UTC 2021
[ 5.930430] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[ 5.933689] nvidia-uvm: Loaded the UVM driver, major device number 245.
[ 5.935609] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 5.947479] resource sanity check: requesting [mem 0x000e0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000e0000-0x000e3fff window]
[ 5.947483] caller _nv032275rm+0x2a/0x60 [nvidia] mapping multiple BARs
[ 6.105259] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000d3fff window]
[ 6.105262] caller _nv000717rm+0x1ad/0x200 [nvidia] mapping multiple BARs
[ 6.691281] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
...
[ 45.362999] [drm:drm_new_set_master] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[ 54.725538] [drm:drm_new_set_master] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
I don’t know what kernel configs I need to set. Here’s what I have:
#
# Graphics support
#
# CONFIG_AGP is not set
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
# CONFIG_DRM_DP_AUX_CHARDEV is not set
# CONFIG_DRM_DEBUG_MM is not set
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=y
# CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS is not set
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
# CONFIG_DRM_FBDEV_LEAK_PHYS_SMEM is not set
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
# CONFIG_DRM_DP_CEC is not set
And I have nvidia-drm.modeset=1 on the kernel command line.
I would like to jump into this topic, too. Can we bump it as bug?
I have similiar issues with Ubuntu 21.10 and Wayland, and this is a step to freeze my computer.
After I resume computer from sleep, it gets freeze.
I commented out section in nvidia-sleep.sh responsible to switch virtual terminal. After this and after resume I could see text Linux Kernel terminal and switch to other text terminals.
However switching to graphical (Wayland session) causes computer to freeze.
In dmesg logs I see above error.
It quite nasty, as Wayland is getting more and more popular, and I see it looks better than X, however it looks it has issues with NVidia cards.
I’ve made bit more investigations and it looks like Linux hangs only for graphics / terminal. I can still use SSH to log in there (however monitor does not work).
I’m not most sure if above message Failed to grab modeset ownership as I’ve seen it in logs and thing worked.
However I can’t find anything in meaningful in logs related to this issue (even if it happens).
All for now: it’s Nvidia drivers (not only 495) + Wayland + suspend to RAM
Would be nice if NVidia dev could check it - I can make some more tests, and provide input if needed.
Hard to say, I think it’s hard to say something, below are my recent checks. All I can say it’s somehow related to Wayland and modset. I see that’s not uniquely related to Wayland and S3 power state, becouse switching to fbcon (when Wayland is running) is problematic, too.
Maybe just one more interesting thing, sometimes when graphics hangs I see UEFI logo.
dmesg:
[18286.452827] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 1 Error
[18286.452832] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 2 Error
[18286.452835] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 3 Error
[18286.452838] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 9 Error
[18286.452841] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 18 Error
[18286.452845] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: ESR 0x405840=0x8204020e
[18286.452851] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: ESR 0x405848=0x80000000
[18286.453056] NVRM: Xid (PCI:0000:09:00): 13, pid=424885, Graphics Exception: ChID 0010, Class 0000c197, Offset 00001944, Data 00000000
sddm/wayland-session.log:
kwin_wayland_drm: Failed to acquire output EGL stream frame: "3353"
After restarting sddm
[18841.184645] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000900] Failed to grab modeset ownership
After starting fresh wayland session (box restart too), going to Linux console and back to Wayland (graphics hangs).
dmesg:
[ 137.298350] fbcon: Taking over console
[ 137.298474] Console: switching to colour frame buffer device 240x67
And my recent findings, after starting kwin_wayland from cmd line, and switching back and forth fbcons, I see this in logs
Filter multi-plane format 842093913
Filter multi-plane format 842094158
Filter multi-plane format 825382478
Filter multi-plane format 909203022
Filter multi-plane format 875714126
kwin_core: Failed to update gamma ramp for output KWin::DrmOutput(0x55ebc3a735e0, name="HDMI-A-1", geometry=QRect(0,0 3840x2160), scale=1)
After switching back to wayland
kwin_wayland_drm: Failed to acquire output EGL stream frame: "321c"
kwin_core: Failed to update gamma ramp for output KWin::DrmOutput(0x55ebc3a735e0, name="HDMI-A-1", geometry=QRect(0,0 3840x2160), scale=1)
kwin_core: Failed to update gamma ramp for output KWin::DrmOutput(0x55ebc3a735e0, name="HDMI-A-1", geometry=QRect(0,0 3840x2160), scale=1)
In all cases everything hangs and I have to force close kwin_wayland with singal 9.
[drm:drm_new_set_master [drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
This message does appear for me in 495.46 (in fact two at a time) (Optimus laptop with intel comet lake cpu and rtx 3060), but does not seem to cause any problems so I just ignore it.