Xorg slowness, freezing and crashing with external monitor, Thinkpad w/ Optimus (GM107GLM [Quadro M2000M])

I usually have my ThinkPad P50 in a dock and use an external monitor that is connected to the dock with the laptop sceen disabled.

A couple of driver releases ago IIRC (a number of months ago anyway), 3D apps started running really slow. glxgears, Steam UI, mpv and VLC, games, were all going at a very low and juddery fps.

I then realised that if I had the external monitor be a clone of the laptop screen then 3D fps wouldn’t tank. But then X11 started pausing for a few seconds on the external screen and then catching up, but crashing eventually, or sometimes the external screen image would freeze but the laptop screen could still be used.

The Xorg error on crash is below.

nvidia-bug-report.log.gz (2.2 MB)

% inxi -F n
System:
  Host: red Kernel: 6.1.10-zen1-1-zen arch: x86_64 bits: 64 Desktop: awesome
    v: 4.3 Distro: Arch Linux
Machine:
  Type: Laptop System: LENOVO product: 20EN0006UK v: ThinkPad P50
    serial: <superuser required>
  Mobo: LENOVO model: 20EN0006UK v: SDK0J40705 WIN
    serial: <superuser required> UEFI-[Legacy]: LENOVO v: N1EET94W (1.67 )
    date: 12/10/2021
Battery:
  ID-1: BAT0 charge: 59.1 Wh (76.0%) condition: 77.8/90.0 Wh (86.4%)
    volts: 11.4 min: 11.2
  ID-2: hidpp_battery_0 charge: 98% condition: N/A
CPU:
  Info: quad core model: Intel Core i7-6820HQ bits: 64 type: MT MCP cache:
    L2: 1024 KiB
  Speed (MHz): avg: 2700 min/max: 800/2700 cores: 1: 2701 2: 2700 3: 2700
    4: 2700 5: 2700 6: 2704 7: 2700 8: 2700
Graphics:
  Device-1: Intel HD Graphics 530 driver: i915 v: kernel
  Device-2: NVIDIA GM107GLM [Quadro M2000M] driver: nvidia v: 525.85.05
  Device-3: Acer ThinkPad P50 Integrated Camera type: USB driver: uvcvideo
  Display: server: X.Org v: 21.1.7 with: Xwayland v: 22.1.7 driver: X:
    loaded: modesetting,nvidia dri: iris gpu: i915,nvidia,nvidia-nvswitch
    resolution: 1: N/A 2: 1920x1080~60Hz 3: N/A
  API: OpenGL v: 4.6.0 NVIDIA 525.85.05 renderer: Quadro M2000M/PCIe/SSE2
Audio:
  Device-1: Intel 100 Series/C230 Series Family HD Audio driver: snd_hda_intel
  Device-2: NVIDIA GM107 High Definition Audio [GeForce 940MX]
    driver: snd_hda_intel
  Sound API: ALSA v: k6.1.10-zen1-1-zen running: yes
  Sound Server-1: PipeWire v: 0.3.65 running: yes
Network:
  Device-1: Intel Ethernet I219-LM driver: e1000e
  IF: net0 state: up speed: 100 Mbps duplex: full mac: c8:5b:76:bf:43:96
  Device-2: Intel Wireless 8260 driver: iwlwifi
  IF: wlan0 state: up mac: f4:8c:50:5a:fc:8f
Bluetooth:
  Device-1: Intel Bluetooth wireless interface type: USB driver: btusb
  Report: bt-adapter ID: hci0 state: up address: F4:8C:50:5A:FC:93
Drives:
  Local Storage: total: 1.14 TiB used: 840.02 GiB (71.8%)
  ID-1: /dev/sda vendor: SanDisk model: SD8TN8U256G1001 size: 238.47 GiB
  ID-2: /dev/sdb vendor: Crucial model: CT1000MX500SSD1 size: 931.51 GiB
Partition:
  ID-1: / size: 1.08 TiB used: 840.02 GiB (76.0%) fs: btrfs dev: /dev/sda1
Swap:
  ID-1: swap-1 type: partition size: 64 GiB used: 0 KiB (0.0%) dev: /dev/sdb1
Sensors:
  System Temperatures: cpu: 55.0 C pch: 48.0 C mobo: N/A
  Fan Speeds (RPM): fan-1: 3087 fan-2: 3083
Info:
  Processes: 314 Uptime: 10m Memory: 62.19 GiB used: 7.44 GiB (12.0%)
  Shell: Zsh inxi: 3.3.25
% glxinfo | grep direct
direct rendering: Yes
    GL_AMD_multi_draw_indirect, GL_AMD_seamless_cubemap_per_texture, 
    GL_ARB_direct_state_access, GL_ARB_draw_buffers, 
    GL_ARB_draw_indirect, GL_ARB_draw_instanced, GL_ARB_enhanced_layouts, 
    GL_ARB_half_float_vertex, GL_ARB_imaging, GL_ARB_indirect_parameters, 
    GL_ARB_multi_draw_indirect, GL_ARB_multisample, GL_ARB_multitexture, 
    GL_EXT_depth_bounds_test, GL_EXT_direct_state_access, 
    GL_NV_alpha_to_coverage_dither_control, GL_NV_bindless_multi_draw_indirect, 
    GL_NV_bindless_multi_draw_indirect_count, GL_NV_bindless_texture, 
    GL_AMD_multi_draw_indirect, GL_AMD_seamless_cubemap_per_texture, 
    GL_ARB_direct_state_access, GL_ARB_draw_buffers, 
    GL_ARB_draw_indirect, GL_ARB_draw_instanced, GL_ARB_enhanced_layouts, 
    GL_ARB_half_float_vertex, GL_ARB_imaging, GL_ARB_indirect_parameters, 
    GL_ARB_multi_draw_indirect, GL_ARB_multisample, GL_ARB_multitexture, 
    GL_EXT_depth_bounds_test, GL_EXT_direct_state_access, 
    GL_NV_alpha_to_coverage_dither_control, GL_NV_bindless_multi_draw_indirect, 
    GL_NV_bindless_multi_draw_indirect_count, GL_NV_bindless_texture, 
    GL_EXT_memory_object, GL_EXT_memory_object_fd, GL_EXT_multi_draw_indirect, 
% glxinfo | grep "OpenGL renderer"
OpenGL renderer string: Quadro M2000M/PCIe/SSE2

Xorg log:

...
[    38.832] (II) NVIDIA(G0): Setting mode "DP-1-3.1: nvidia-auto-select @1920x1080 +0+0 {AllowGSYNC=Off, ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0}"
[    38.915] (II) NVIDIA(G0): Setting mode "DP-1-3.1: nvidia-auto-select @1920x1080 +0+0 {AllowGSYNC=Off, ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0}, DP-1-1: nvidia-auto-select @1920x1080 +0+0 {AllowGSYNC=Off, ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0}"
[    39.234] (II) modeset(0): EDID vendor "LGD", prod id 1359
[    39.234] (II) modeset(0): Printing DDC gathered Modelines:
[    39.234] (II) modeset(0): Modeline "1920x1080"x0.0  138.70  1920 1968 2000 2080  1080 1083 1088 1111 +hsync -vsync (66.7 kHz eP)
[    39.234] (II) modeset(0): Modeline "1920x1080"x0.0  114.46  1920 1968 2000 2164  1080 1083 1088 1102 +hsync -vsync (52.9 kHz e)
[    39.250] (II) modeset(0): EDID vendor "LGD", prod id 1359
[    39.250] (II) modeset(0): Printing DDC gathered Modelines:
[    39.251] (II) modeset(0): Modeline "1920x1080"x0.0  138.70  1920 1968 2000 2080  1080 1083 1088 1111 +hsync -vsync (66.7 kHz eP)
[    39.251] (II) modeset(0): Modeline "1920x1080"x0.0  114.46  1920 1968 2000 2164  1080 1083 1088 1102 +hsync -vsync (52.9 kHz e)
[    39.251] (--) NVIDIA(GPU-0): SFX2K8 4TO2 (DFP-3.1): connected
[    39.251] (--) NVIDIA(GPU-0): SFX2K8 4TO2 (DFP-3.1): Internal DisplayPort
[    39.251] (--) NVIDIA(GPU-0): SFX2K8 4TO2 (DFP-3.1): 960.0 MHz maximum pixel clock
[    39.251] (--) NVIDIA(GPU-0): 
[    39.253] (--) NVIDIA(GPU-0): DFP-0: disconnected
[    39.253] (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
[    39.254] (--) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
[    39.254] (--) NVIDIA(GPU-0): 
[    39.443] (--) NVIDIA(GPU-0): DELL P2719H (DFP-1): connected
[    39.443] (--) NVIDIA(GPU-0): DELL P2719H (DFP-1): Internal TMDS
[    39.443] (--) NVIDIA(GPU-0): DELL P2719H (DFP-1): 300.0 MHz maximum pixel clock
[    39.443] (--) NVIDIA(GPU-0): 
[    39.443] (--) NVIDIA(GPU-0): DFP-2: disconnected
[    39.443] (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
[    39.443] (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
[    39.443] (--) NVIDIA(GPU-0): 
[    39.443] (--) NVIDIA(GPU-0): DFP-3: disconnected
[    39.443] (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
[    39.443] (--) NVIDIA(GPU-0): DFP-3: 960.0 MHz maximum pixel clock
[    39.443] (--) NVIDIA(GPU-0): 
[    39.443] (--) NVIDIA(GPU-0): DFP-4: disconnected
[    39.443] (--) NVIDIA(GPU-0): DFP-4: Internal DisplayPort
[    39.443] (--) NVIDIA(GPU-0): DFP-4: 960.0 MHz maximum pixel clock
[    39.443] (--) NVIDIA(GPU-0): 
[    39.443] (--) NVIDIA(GPU-0): DFP-5: disconnected
[    39.443] (--) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
[    39.443] (--) NVIDIA(GPU-0): DFP-5: 960.0 MHz maximum pixel clock
[    39.443] (--) NVIDIA(GPU-0): 
[   393.440] (II) NVIDIA(G0): Setting mode "DP-1-3.1: nvidia-auto-select @1920x1080 +0+0 {AllowGSYNC=Off, ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0}"
[   479.722] (EE) 
[   479.722] (EE) Backtrace:
[   479.723] (EE) 0: /usr/lib/Xorg (dri3_send_open_reply+0xdd) [0x55861cc8a9ad]
[   479.723] (EE) 1: /usr/lib/libc.so.6 (__sigaction+0x50) [0x7fb7a139cf50]
[   479.724] (EE) 2: /usr/lib/libc.so.6 (pthread_key_delete+0x14c) [0x7fb7a13eb8ec]
[   479.725] (EE) 3: /usr/lib/libc.so.6 (gsignal+0x18) [0x7fb7a139cea8]
[   479.725] (EE) 4: /usr/lib/libc.so.6 (abort+0xd7) [0x7fb7a138653d]
[   479.726] (EE) unw_get_proc_name failed: no unwind info found [-10]
[   479.726] (EE) 5: /usr/lib/dri/iris_dri.so (?+0x0) [0x7fb79e099aa3]
[   479.726] (EE) 6: /usr/lib/dri/iris_dri.so (nouveau_drm_screen_create+0x489890) [0x7fb79eddffd0]
[   479.727] (EE) 7: /usr/lib/dri/iris_dri.so (__driDriverGetExtensions_d3d12+0xb7e36) [0x7fb79e161356]
[   479.727] (EE) 8: /usr/lib/xorg/modules/libglamoregl.so (glamor_block_handler+0xe4) [0x7fb7a066ddd4]
[   479.727] (EE) unw_get_proc_name failed: no unwind info found [-10]
[   479.727] (EE) 9: /usr/lib/xorg/modules/drivers/modesetting_drv.so (?+0x0) [0x7fb7a09f41b7]
[   479.728] (EE) 10: /usr/lib/Xorg (BlockHandler+0xb4) [0x55861cbab6a4]
[   479.728] (EE) 11: /usr/lib/Xorg (WaitForSomething+0x197) [0x55861cc7de67]
[   479.728] (EE) 12: /usr/lib/Xorg (SProcXkbDispatch+0x1c93) [0x55861cb6d36f]
[   479.729] (EE) 13: /usr/lib/libc.so.6 (__libc_init_first+0x90) [0x7fb7a1387790]
[   479.729] (EE) 14: /usr/lib/libc.so.6 (__libc_start_main+0x8a) [0x7fb7a138784a]
[   479.729] (EE) 15: /usr/lib/Xorg (_start+0x25) [0x55861cb6e2b5]

journald:

...
Feb 10 00:30:40 red kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000947d:0:0:1119
Feb 10 00:30:42 red kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:1128
Feb 10 00:30:44 red kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000947d:0:0:1119
Feb 10 00:30:46 red kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:1128
Feb 10 00:30:52 red root[2355]: ACPI group/action undefined: jack/lineout / LINEOUT
Feb 10 00:30:52 red root[2357]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Feb 10 00:30:52 red root[2359]: ACPI group/action undefined: jack/lineout / LINEOUT
Feb 10 00:30:52 red root[2361]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Feb 10 00:30:52 red root[2363]: ACPI group/action undefined: jack/lineout / LINEOUT
Feb 10 00:30:52 red root[2365]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Feb 10 00:30:52 red root[2367]: ACPI group/action undefined: jack/lineout / LINEOUT
Feb 10 00:30:52 red root[2369]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Feb 10 00:31:42 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:31:42 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:31:43 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:31:43 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:33:52 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:34:19 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:34:19 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:34:54 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Feb 10 00:34:54 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
...

The effect that the output comes down to a crawl when the lid is closed is a know bug that hasn’t been fixed so far.
https://forums.developer.nvidia.com/t/525-60-11-breaks-composited-desktop-on-x11-if-laptop-lid-is-closed/236181?u=generix
It shouldn’t crash on mirror, nevertheless. In your Xorg logs, the modesetting driver has an issue:

[    12.813] (EE) modeset: Failed to load module "glamoregl" (module does not exist, 0)
[    12.813] (EE) modeset(0): Failed to load glamor module.

The module should be provided by the xorg-server package and reside in /usr/lib/xorg/modules/libglamoregl.so
https://archlinux.org/packages/extra/x86_64/xorg-server/
I think the path is missing from what you set in xorg.conf, rather delete it.

1 Like

A-ha, that’s interesting, thank you.

I don’t run a compositor, WM only, no DE, so I didn’t notice with all basic desktop apps (I can’t run Picom because it flashes).

That maybe also explains why sometimes some UI changes (opening a popup, menu, resizing, etc.) causes a very small pause; I guess maybe that Qt or GTK is doing something roughly similar in terms of processing path…

In terms of Xorg config, I don’t have an xorg.conf, though I do use Optimus Manager, running almost exclusively in Hybrid mode (for 4 screens max).

The /etc/X11/xorg.conf.d/10-optimus-manager.conf it generated:

Section "Files"
	ModulePath "/usr/lib/nvidia"
	ModulePath "/usr/lib32/nvidia"
	ModulePath "/usr/lib32/nvidia/xorg/modules"
	ModulePath "/usr/lib32/xorg/modules"
	ModulePath "/usr/lib64/nvidia/xorg/modules"
	ModulePath "/usr/lib64/nvidia/xorg"
	ModulePath "/usr/lib64/xorg/modules"
EndSection

Section "ServerLayout"
	Identifier "layout"
	Screen 0 "integrated"
	Inactive "nvidia"
	Option "AllowNVIDIAGPUScreens"
EndSection

Section "Device"
	Identifier "integrated"
	Driver "modesetting"
	BusID "PCI:0:2:0"
	Option "DRI" "3"
EndSection

Maybe I have misunderstood and misconfigured something within it? Or I am missing some other needed config element?

/etc/optimus-manager/optimus-manager.conf;

[amd]
DRI=3
driver=modesetting
tearfree=

[intel]
DRI=3
accel=
driver=modesetting
modeset=yes
tearfree=

[nvidia]
DPI=96
PAT=yes
allow_external_gpus=no
dynamic_power_management=no
ignore_abi=yes
modeset=yes
options=overclocking

[optimus]
auto_logout=yes
pci_power_control=no
pci_remove=no
pci_reset=no
startup_auto_battery_mode=integrated
startup_auto_extpower_mode=nvidia
startup_mode=hybrid
switching=none

libglamoregl.so exists, 228K -rwxr-xr-x 1 root root 225K Feb 7 07:58 /usr/lib/xorg/modules/libglamoregl.so

Edit: I see now there is no /usr/lib/xorg/modules path defined, I’ll try that… though, if libglamoregl.so isn’t present at that path, the computer just doesn’t see any external screens.

Edit2: oh, % ll /usr/lib64/xorg/modules/libglamoregl.so 228K -rwxr-xr-x 1 root root 225K Feb 7 07:58 /usr/lib64/xorg/modules/libglamoregl.so

For some reason, AFAIU, that /var/log/Xorg.0.log is from earlier this week, when I was testing by temporaraly moving libglamor. (Someone mentioned that… radical option on the Arch wiki talk page.)

Here’s a fresh log with a crash.

Xorg.0.log (62.9 KB)

That Arch wiki page is heavily outdated and even deadly when using the nvidia gpu in offload mode.
Looking at the crash in the new log, this rather seems to be a mesa bug, expecting running nouveau.
Please delete your xorg.conf, at least the DRI 3 entry.

Do you mean the Arch wiki article, or just the article talk page?

No, I’m not using Nouveau, it’s too slow.

As I said, I don’t have an xorg.conf.

The DRI 3 option is for the integrated graphics device. I’ll unset that anyway.

Apologies, I realise now I somehow managed to cut off the second half of /etc/X11/xorg.conf.d/10-optimus-manager.conf, here is the full config, as generated by Optimus Manager;

Section "Files"
	ModulePath "/usr/lib/nvidia"
	ModulePath "/usr/lib32/nvidia"
	ModulePath "/usr/lib32/nvidia/xorg/modules"
	ModulePath "/usr/lib32/xorg/modules"
	ModulePath "/usr/lib64/nvidia/xorg/modules"
	ModulePath "/usr/lib64/nvidia/xorg"
	ModulePath "/usr/lib64/xorg/modules"
EndSection

Section "ServerLayout"
	Identifier "layout"
	Screen 0 "integrated"
	Inactive "nvidia"
	Option "AllowNVIDIAGPUScreens"
EndSection

Section "Device"
	Identifier "integrated"
	Driver "modesetting"
	BusID "PCI:0:2:0"
	Option "DRI" "3"
EndSection

Section "Screen"
	Identifier "integrated"
	Device "integrated"
EndSection

Section "Device"
	Identifier "nvidia"
	Driver "nvidia"
	BusID "PCI:1:0:0"
	Option "Coolbits" "28"
EndSection

Section "Screen"
	Identifier "nvidia"
	Device "nvidia"
EndSection

FWIW I can file a Mesa bug, but this problem (and the other one came with one) coincided with Nvidia driver updates.

The talk page where you got the info to remove libglamor from.

You’re not using nouveau but the backtrace of the Xorg server has a reference to it /usr/lib/dri/iris_dri.so (nouveau_drm_screen_create+0x489890) so it seems mesa is expecting it.

Yes, and only the igpu driver is crashing, not the nvidia driver.

1 Like

Aaaaha right, yes, thank youI It was the Intel crashing X, and it was having that option set to 3 rather than 2 causing it; there has been no crash since the change.

My problem now is that the external screen image freezes. Timing may differ, I’m not sure how to replicate.

If playing a game, and there is a slowdown judder - I just stop all input so things can calm down before I continue pushing forward - that can avoid a freeze.

Sometimes there are juddery freezes for very short times just, say, typing in Firefox.

External screen just by itself is just a big no no.

Also, rotating a third screen in X11 is broken, the output orientation is wrong.

So I tried using the downgrade tool that Arch Linux users have available in the AUR after seeing this post.

All versions of the driver back to 525.78 freeze up and crash X after a few tens of minutes.

I can’t get nvidia-dkms and nvidia-utils 525.60.11 installed because that nvidia-dkms fails with;

...
(5/7) Install DKMS modules
==> dkms install --no-depmod nvidia/525.60.11 -k 6.2.8-arch1-1
Error! Bad return status for module build on kernel: 6.2.8-arch1-1 (x86_64)
Consult /var/lib/dkms/nvidia/525.60.11/build/make.log for more information.
==> WARNING: `dkms install --no-depmod nvidia/525.60.11 -k 6.2.8-arch1-1' exited 10
==> dkms install --no-depmod nvidia/525.60.11 -k 6.2.8-zen1-1-zen
Error! Bad return status for module build on kernel: 6.2.8-zen1-1-zen (x86_64)
Consult /var/lib/dkms/nvidia/525.60.11/build/make.log for more information.
==> WARNING: `dkms install --no-depmod nvidia/525.60.11 -k 6.2.8-zen1-1-zen' exited 10
==> dkms install --no-depmod nvidia/525.60.11 -k 6.1.21-1-lts
==> depmod 6.1.21-1-lts
==> ERROR: Missing 5.15.18-1-lts kernel headers for module nvidia/525.60.11.
(6/7) Reloading system bus configuration...
(7/7) Checking which packages need to be rebuilt
sudo downgrade 'nvidia-dkms<=530.41.03-1' nvidia-utils  1158.52s user 136.93s system 474% cpu 4:33.21 total
...
  CC [M]  /var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-encoder.o
  CC [M]  /var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-connector.o
/var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-drv.c: In function ‘nv_drm_init_mode_config’:
/var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-drv.c:262:21: error: ‘struct drm_mode_config’ has no member named ‘fb_base’
  262 |     dev->mode_config.fb_base = 0;
      |                     ^
  CC [M]  /var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-gem.o
make[2]: *** [scripts/Makefile.build:252: /var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-drv.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-connector.c: In function ‘__nv_drm_detect_encoder’:
/var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-connector.c:101:18: error: ‘struct drm_connector’ has no member named ‘override_edid’
  101 |     if (connector->override_edid) {
      |                  ^~
/var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘__nv_drm_plane_atomic_destroy_state’:
/var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-crtc.c:678:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  678 |     struct nv_drm_plane_state *nv_drm_plane_state =
      |     ^~~~~~
make[2]: *** [scripts/Makefile.build:252: /var/lib/dkms/nvidia/525.60.11/build/nvidia-drm/nvidia-drm-connector.o] Error 1
make[1]: *** [Makefile:2021: /var/lib/dkms/nvidia/525.60.11/build] Error 2
make: *** [Makefile:82: modules] Error 2

I did though get the ability to rotate an external screen without it being broken again - after disabling and then reenabling it. Can’t remember which driver versions this worked with.

Going back to 530.x and the primary external monitor freezes very easily (e.g. sometimes when fullscreening or unfullscreening mpv). Attached are the new generated reports.

nvidia-bug-report.log.gz (1.8 MB)

The current boot only had errors from the intel igpu freezing

[ 3192.485444] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 3192.485515] i915 0000:00:02.0: [drm] Xorg[1610] context reset due to GPU hang
[ 3192.496411] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in Xorg [1610]

the previous boot had errors from the nvidia driver it couldn’t sync to the intel driver

Apr 17 21:45:22 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Apr 17 21:45:22 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Apr 17 21:45:22 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event
Apr 17 21:45:22 red kernel: [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event

before that there was an nvidia-only error from the display engine failing to handle the external monitor

Apr 17 02:53:15 red kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000947d:0:0:1119
Apr 17 02:53:17 red kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:1128
Apr 17 02:53:19 red kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000947d:0:0:1119
Apr 17 02:53:21 red kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:1128

So you really have a mixed bag of issues from both gpu drivers.

1 Like

The current boot log in that report archive is with 530.x, the previous boot log is with 525.x.

As of now I am on mesa-git and xf86-video-intel-git, but the problem remains.

Linux 6.2, Arch Zen kernel build, I login (no display manager), startx, awesomewm with no compositor, Optimus Manager set to a hybrid configuration, with just one external monitor connected directly to the laptop (though the P52 is in a dock, I have unplugged the third and fourth monitors from that for now), open doublecmd file manager (qt or gtk2 build leads to the same behaviour), open a 1080p episode of Stranger Things, and, like, 50/50, that will fully freeze the external screen image without it showing mpv. If that didn’t freeze it then 50/50 fullscreening it will freeze the external screen image with a windowed mpv whilst the laptop screen it’s mirroring will show a fullscreened mpv. And if that didn’t freeze it, unfullscreening a few seconds later is highly likely to do so. Nothing shows in dmesg or journalctl or the xorg.0.log on the freeze.

The xf86-video-intel-git PKGBUILD echos a note saying it uses DRI3, so I’ve also just switched from 2 to 3 to see if that affects anything. Previously (above), having that to 3 not 2 was causing X to crash out (not sure what Nvidia driver version). But opening or toggling fullscreen in mpv is still enough to quickly freeze the external screen.

Edit: fresh report, FWIW;
nvidia-bug-report.log.gz (1.9 MB)

So I saw there was the new 525.105.17 production driver release, so I asked in #archlinux-aur and someone created a set of AUR entries to create the packages to install that version.

The problem I have is still a freezing output;

But now the problem is that both internal and external (mirrored) outputs totally freeze up, apart from the mouse pointer, which remains movable, though nothing changes on screen when things that should change the screen are changed. If I use a keyboard hotkey to open a terminal and type a command to reboot, the system reboots, so it’s not frozen totally, just the graphics.

Sometimes still also it’s just the external monitor that freezes, which can be recovered (for a time) via disabling the external output in arandr then reenabling it.

nvidia-bug-report.log.gz (1.8 MB)

Now on 525.116.03 but opening a video in mpv can still freeze the external screen (laptop not dock) on full screening too quickly after starting the video.

nvidia-bug-report.log.gz (1.8 MB)

I just had X crash out to the console after I opened 6 YouTube tabs in quick sucsession.

nvidia-bug-report.log.gz (1.8 MB)

I have noted in Discord (heavily) and in Firefox (occasionally) that, when I’m typing and looking at the screen, the image being presented is for the second last character typed. I’m not sure how to reproduce in Firefox.

As before, I’ve no compositor enabled, though I guess such apps might use a somewhat similar graphics rendering pathway.

Maybe the timing related to this issue might somehow prove a clue, idk.

I’ve realised I think now that working with the main laptop screen off doesn’t cause a problem, but I darent due to the possibility of the external screen(s) freezing. I can still just get a PTY though.

nvidia-bug-report.log.gz (1.8 MB)

X crashed out whilst opening directories of FLAC and images in MPV from doublecmd (sorting my music collection).

And again whilst just starting to use Firefox.

nvidia-bug-report.log.gz (1.8 MB)

Not sure what’s changed with -04? I don’t think I’m imaginging this level of instability now.

I’m on 535.54.03 now but I’ve still been having problems in the last couple of days (having been out of town for a while before).

Still just using Awesome WM with no compositor. Sometimes the laptop and external screen will pause for a few seconds, either during a movie or when switching to a new window.

Sometimes the external monitor freezes on a certain frame and I have to use to the laptop screen to turn the external screen off and on to get it working again.

Sometimes both screens will freeze, with the external monitor flickering between apparently the last two frames, though I can still use my keyboard to blindly bring up a terminal and reboot.

nvidia-bug-report.log.gz (1.9 MB)

This is crazy. I’ve got three external monitors and I can’t use any of them because a freeeze will happpen after 5 minutes of doing like anything. Changing FF tabs can make the laptop screen lag almost 3 seconds. This is brutally disabling to my system. What a nightmare.

I realise that, when both screens freeze, I can use the ‘restart window Manager’ hotkey for AwesomeWM, and it updates the frozen output by one frame, so I can very laggardly navigate to copy and paste written but unsaved text from an editor to my chat app (because menus apparently don’t show or can’t be selected using this workaround).