Ubuntu 21.10 - "Failed to grab modeset ownership" with 495.44

Hi,

I’ve recently installed a fresh copy of Ubuntu 21.10. I have a P5000.

I’ve noticed many errors in demsg

[10806.670226] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000200] Failed to grab modeset ownership

02:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Quadro P5000] (rev a1)
NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5

Linux data 5.13.0-21-generic #21-Ubuntu SMP Tue Oct 19 08:59:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Does anyone know what could be causing it or how to t-shoot?

Thanks

3 Likes

I have just noticed this on kernel 5.14.1 on Ubuntu 21.04 as well. My guess is that if you install the mainline kernel 5.14.16 using the Ubuntu mainline GUI kernel installation tool (which installs kernels from the Ubuntu mainline kernel websote), then this might go away for you.

Unfortunately, the dependency for this kernel is libc6 >= 2.34, which Ubuntu 21.10 has, but Ubuntu 21.04 is stuck on 2.33. I have a laptop on 21.10, I might check this tomorrow.

Hi there, I checked on 5.14.16 kernel, and this issue does not exist on my Dell XPS with a GTX 1050 Ti Mobile on Ubuntu 21.10, so you can try this new kernel if you feel comfortable. Good luck.

$ sudo dmesg | grep -i -e "nv-" -e "nvidia" -e "drm"
[    0.151034] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    3.694251] systemd[1]: Starting Load Kernel Module drm...
[    3.708549] systemd[1]: modprobe@drm.service: Deactivated successfully.
[    3.708790] systemd[1]: Finished Load Kernel Module drm.
[    4.258127] nvidia: loading out-of-tree module taints kernel.
[    4.258139] nvidia: module license 'NVIDIA' taints kernel.
[    4.258629] i915 0000:00:02.0: [drm] Incompatible option enable_guc=3 - GuC submission is N/A
[    4.259175] fb0: switching to inteldrmfb from EFI VGA
[    4.268015] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    4.285566] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[    4.286321] nvidia 0000:01:00.0: enabling device (0006 -> 0007)
[    4.287212] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    4.360236] i915 0000:00:02.0: [drm] GuC firmware i915/kbl_guc_49.0.1.bin version 49.0 submission:disabled
[    4.360240] i915 0000:00:02.0: [drm] HuC firmware i915/kbl_huc_4.0.0.bin version 4.0 authenticated:yes
[    4.409600] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  495.44  Fri Oct 22 06:13:12 UTC 2021
[    4.415298] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[    4.437861] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  495.44  Fri Oct 22 06:05:22 UTC 2021
[    4.450137] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    4.504388] i915 0000:00:02.0: [drm] fb0: i915 frame buffer device
[    5.085186] audit: type=1400 audit(1636155701.082:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=829 comm="apparmor_parser"
[    5.085189] audit: type=1400 audit(1636155701.082:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=829 comm="apparmor_parser"
[    5.202863] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[    5.222536] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[    5.225082] nvidia-uvm: Loaded the UVM driver, major device number 506.

Hi,

I upgraded to the latest Linux data 5.15.1-051501-generic #202111061036 SMP Sat Nov 6 14:39:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

But I’m still seeing the errors. Google is not really showing up many results on what it could be.

[    0.758507] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    4.475826] ast 0000:07:00.0: [drm] P2A bridge disabled, using default configuration
[    4.475828] ast 0000:07:00.0: [drm] AST 2400 detected
[    4.475839] ast 0000:07:00.0: [drm] Analog VGA only
[    4.475840] ast 0000:07:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[    4.476214] [drm] Initialized ast 0.1.0 20120228 for 0000:07:00.0 on minor 0
[    4.481002] ast 0000:07:00.0: [drm] fb0: ast frame buffer device
[    4.669625] nvidia: loading out-of-tree module taints kernel.
[    4.669636] nvidia: module license 'NVIDIA' taints kernel.
[    4.718634] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    4.731326] nvidia-nvlink: Nvlink Core is being initialized, major device number 511
[    4.733075] nvidia 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    4.848995] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  495.44  Fri Oct 22 06:13:12 UTC 2021
[    4.857142] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  495.44  Fri Oct 22 06:05:22 UTC 2021
[    4.858375] [drm] [nvidia-drm] [GPU ID 0x00000200] Loading driver
[    6.335710] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:02:00.0 on minor 1
[  137.983807] systemd[1]: Starting Load Kernel Module drm...
[  137.996180] systemd[1]: modprobe@drm.service: Deactivated successfully.
[  137.996568] systemd[1]: Finished Load Kernel Module drm.
[  138.149084] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[  138.151895] nvidia-uvm: Loaded the UVM driver, major device number 508.
[  138.531344] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.0/0000:02:00.1/sound/card0/input4
[  138.531423] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.0/0000:02:00.1/sound/card0/input5
[  138.531616] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.0/0000:02:00.1/sound/card0/input6
[  138.531660] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.0/0000:02:00.1/sound/card0/input7
[  138.531697] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:02.0/0000:02:00.1/sound/card0/input8
[  138.531733] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:02.0/0000:02:00.1/sound/card0/input9
[  138.531769] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:02.0/0000:02:00.1/sound/card0/input10
[  220.049100] audit: type=1400 audit(1636334052.285:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=8942 comm="apparmor_parser"
[  220.050071] audit: type=1400 audit(1636334052.285:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=8942 comm="apparmor_parser"
[  360.982419] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000200] Failed to grab modeset ownership
[  360.997733] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000200] Failed to grab modeset ownership```

Hi,
@berglh One comment: you pasted first few seconds of boot log. In case of “user28546” the bug appeared at second 360. In my case is atso never occurs at the very beginning.

One think that I can surely say that for me it occurs if and only if monitor (monitors, I use 3, not sure if that is relevant) is being woken for sleep. I will try newer kernel, just saying that no errors in few first seconds of boor log are not indicative of whether this issue is there or not.

I installed kernel 5.15.0-051500-generic, I am running:

# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  495.44  Fri Oct 22 06:13:12 UTC 2021
GCC version:  gcc version 11.2.0 (Ubuntu 11.2.0-7ubuntu2)

and my dmesg contains:

# dmesg | grep ownership
[ 16.889051] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset **ownership**

And of course I have nvidia-drm.modeset=1 set with the idea of using wayland.

I was running 495.44 on my desktop machine on Ubuntu 21.04 and didn’t have this issue. I’ve just noticed it after upgrading my desktop machine with 2080 Super and 5.14.17 kernel. I’m not sure why I didn’t see this issue on my laptop, maybe it’s also related the type of GPU as well?
[drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000600] Failed to grab modeset ownership

@berglh One comment: you pasted first few seconds of boot log. In case of “user28546” the bug appeared at second 360. In my case is atso never occurs at the very beginning.

I left the machine running for quite some time then grepped for items relating to nvidia and drm. That’s why it was only showing entries from the boot time.

I am also seeing this on Gentoo. I have a GTX 970:

01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

I’m using nvidia-drivers 495.44 on Linux 5.15.1. As soon as I start X, I get the same error:

[    5.792304] nvidia: loading out-of-tree module taints kernel.
[    5.792316] nvidia: module license 'NVIDIA' taints kernel.
[    5.792316] Disabling lock debugging due to kernel taint
[    5.808903] nvidia-nvlink: Nvlink Core is being initialized, major device number 247

[    5.809614] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    5.925348] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  495.44  Fri Oct 22 06:13:12 UTC 2021
[    5.928680] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  495.44  Fri Oct 22 06:05:22 UTC 2021
[    5.930430] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[    5.933689] nvidia-uvm: Loaded the UVM driver, major device number 245.
[    5.935609] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    5.947479] resource sanity check: requesting [mem 0x000e0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000e0000-0x000e3fff window]
[    5.947483] caller _nv032275rm+0x2a/0x60 [nvidia] mapping multiple BARs
[    6.105259] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000d3fff window]
[    6.105262] caller _nv000717rm+0x1ad/0x200 [nvidia] mapping multiple BARs
[    6.691281] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
...
[   45.362999] [drm:drm_new_set_master] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   54.725538] [drm:drm_new_set_master] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership

I don’t know what kernel configs I need to set. Here’s what I have:

#
# Graphics support
#
# CONFIG_AGP is not set
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
# CONFIG_DRM_DP_AUX_CHARDEV is not set
# CONFIG_DRM_DEBUG_MM is not set
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=y
# CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS is not set
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
# CONFIG_DRM_FBDEV_LEAK_PHYS_SMEM is not set
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
# CONFIG_DRM_DP_CEC is not set

And I have nvidia-drm.modeset=1 on the kernel command line.

I’m also a Gentoo user with same problem here. I’m using linux kernel 5.10.76.

I use OpenSuse Tumbleweed with Kernel 5.14.14-2-default, a GTX 960 and Nvidia driver version 470.86, and I’m having the same problem.

Ubuntu 21.10

GeForce GTX 1060 6GB

5.13.0-21-generic

470.82.00-0ubuntu0.21.10.1

kernel: [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000a00] Failed to grab modeset ownership
1 Like

I would like to jump into this topic, too. Can we bump it as bug?

I have similiar issues with Ubuntu 21.10 and Wayland, and this is a step to freeze my computer.

After I resume computer from sleep, it gets freeze.

I commented out section in nvidia-sleep.sh responsible to switch virtual terminal. After this and after resume I could see text Linux Kernel terminal and switch to other text terminals.

However switching to graphical (Wayland session) causes computer to freeze.

In dmesg logs I see above error.

It quite nasty, as Wayland is getting more and more popular, and I see it looks better than X, however it looks it has issues with NVidia cards.

I am getting this same issue on Archlinux with Nvidia 495.46. I am also getting sleep issues like @anon22950299

I’ve made bit more investigations and it looks like Linux hangs only for graphics / terminal. I can still use SSH to log in there (however monitor does not work).

I’m not most sure if above message Failed to grab modeset ownership as I’ve seen it in logs and thing worked.

However I can’t find anything in meaningful in logs related to this issue (even if it happens).

All for now: it’s Nvidia drivers (not only 495) + Wayland + suspend to RAM

Would be nice if NVidia dev could check it - I can make some more tests, and provide input if needed.

Yeah I checked my old logs back when suspend worked and I was still seeing that Failed to grab modeset ownership message, so it might be unrelated?

This Archlinux forum post seems to suggest that there’s issues with Xserver 21.1.2 and Nvidia 495.46: [SOLVED] 5.15 kernel: system frozen if suspending or hibernating twice / Kernel & Hardware / Arch Linux Forums

EDIT: @anon22950299 Found this other post suggesting that downgrading xserver to an older version solves the issue: [SOLVED] 5.15.8, something's off [not kernel problem] / Kernel & Hardware / Arch Linux Forums

Hard to say, I think it’s hard to say something, below are my recent checks. All I can say it’s somehow related to Wayland and modset. I see that’s not uniquely related to Wayland and S3 power state, becouse switching to fbcon (when Wayland is running) is problematic, too.

Maybe just one more interesting thing, sometimes when graphics hangs I see UEFI logo.

dmesg:

[18286.452827] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 1 Error
[18286.452832] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 2 Error
[18286.452835] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 3 Error
[18286.452838] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 9 Error
[18286.452841] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: Shader Program Header 18 Error
[18286.452845] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: ESR 0x405840=0x8204020e
[18286.452851] NVRM: Xid (PCI:0000:09:00): 13, pid=301, Graphics Exception: ESR 0x405848=0x80000000
[18286.453056] NVRM: Xid (PCI:0000:09:00): 13, pid=424885, Graphics Exception: ChID 0010, Class 0000c197, Offset 00001944, Data 00000000

sddm/wayland-session.log:
kwin_wayland_drm: Failed to acquire output EGL stream frame: "3353"

After restarting sddm
[18841.184645] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000900] Failed to grab modeset ownership

After starting fresh wayland session (box restart too), going to Linux console and back to Wayland (graphics hangs).

dmesg:
[  137.298350] fbcon: Taking over console
[  137.298474] Console: switching to colour frame buffer device 240x67

And my recent findings, after starting kwin_wayland from cmd line, and switching back and forth fbcons, I see this in logs

Filter multi-plane format 842093913
Filter multi-plane format 842094158
Filter multi-plane format 825382478
Filter multi-plane format 909203022
Filter multi-plane format 875714126
kwin_core: Failed to update gamma ramp for output KWin::DrmOutput(0x55ebc3a735e0, name="HDMI-A-1", geometry=QRect(0,0 3840x2160), scale=1)

After switching back to wayland

kwin_wayland_drm: Failed to acquire output EGL stream frame: "321c"
kwin_core: Failed to update gamma ramp for output KWin::DrmOutput(0x55ebc3a735e0, name="HDMI-A-1", geometry=QRect(0,0 3840x2160), scale=1)
kwin_core: Failed to update gamma ramp for output KWin::DrmOutput(0x55ebc3a735e0, name="HDMI-A-1", geometry=QRect(0,0 3840x2160), scale=1)

In all cases everything hangs and I have to force close kwin_wayland with singal 9.

I just want to say my “Failed to grab modeset ownership” message doesn’t appear anymore in 495.46. As far as I am concerned the issue is resolved.

[drm:drm_new_set_master [drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
This message does appear for me in 495.46 (in fact two at a time) (Optimus laptop with intel comet lake cpu and rtx 3060), but does not seem to cause any problems so I just ignore it.