External monitor freezes when using dedicated GPU

nvidia-driver-470/jammy-security,jammy-updates,jammy,now 470.199.02-0ubuntu0.22.04.1 amd64 [installed]

No issues with this one today… I have not tried performance mode

I have used the computer throughout my workday with no freezes! I hope this is the permanent solution!

nvidia-settings performance mode has also solved my freezes so far on 545.23.06
back to my triple monitor setup for a full workday yesterday and today with no freezes.
I haven’t tried the “nvidia-smi -lgc 600,2100” suggestion nor updating to 545.29.02, might try over the weekend to not interrupt my workday

Sadly setting Preffered Mode to “Prefer Maximum Performance” did not help me with my freezing. I can still get it to freeze with glxgears on the nvidia chip on the external monitor.

1 Like

Yep, bad news guys, it looks like performance mode delays freezing a lot due to faster performance, but eventually it freezes if you accumulate enough consumption…
I’ve been running a grindy browser game for two days and finally external monitor is frozen again.
It feels like it’s some memory leak in driver :/

@generix

I’ve tried using lgc, but still having exactly the same freezing issues on my end.

tested latest driver 535.129.03 - same issue.

Been having issues with my Thinkpad P1 gen 4 in the last few weeks, Ubuntu 22.04, 3070 mobile, 535.129.03. This is a work machine so I’m not sure I can run the script on it, but wanted to add my experience.

After the system turns off its screen due to inactivity, the external monitor is frozen, seemingly on the lock screen background. Have to unplug and replug to fix it. This is with a direct HDMI connection.

When using a Lenovo dock, I have been getting screen lockups, usually in OpenOffice Calc, once in VSCode. Have to unplug/replug.

I’ll try performance mode but given how much heat this laptop already produces it’s not a great solution.

Are you guys using GNOME + Wayland? Do all reports here involve laptops with multiple GPUs? In my case, I have a AMD iGPU and a Nvidia dGPU, and if I understood correctly, there is a serious performance problem when mutter tries to copy data among GPUs when one of them is a Nvidia GPU. Hopefully the patch will be merged soon.

ThinkPad P50, AwesomeWM (X11) (so no DE, and no DM), no compositor.

New way to reproduce freeze (I’m currently on 535.129.03):
plug/unplug keyboard several times. On each plug there is a short freeze, that unfreezes after a second… if you keep doing it that way continuously, it will eventually freeze external monitor.

Hello. I can confirm this happens on my laptop, too.

I have a Lenovo Legion 7 15IMHg05 laptop (81YU (LENOVO_MT_81YU_BU_idea_FM_Legion 7 15IMHg05)) with 64 GB or RAM, and an NVIDIA GeForce RTX 2070 with Max-Q Design (8GB of GPU RAM) GPU.

I’m using Linux Mint 20.2 (based on ubuntu-20.04), nvidia driver ubuntu package nvidia-driver-525 (version 525.147.05-0ubuntu0.20.04.1) with an optimus (prime) configuration, as I do both work and gaming on the same hardware, and low power usage is an absolute “must” for me (and the reason I bought this laptop in the first place).

I’ve noticed that my dual-monitor (laptop display + external monitor) setup works always reliable if I’m not using the discrete gpu. No matter what I do, including using different resolutions, refresh rates, etc., I can do work + gaming with the integrated (Intel) gpu reliably and without any issues.

But as soon as the discrete gpu is being used (when gaming), be that with opengl or vulkan, the external monitor will follow a pattern of “stuttering” after some time – sometimes it’s after a few minutes, sometimes after a few seconds. Eventually (anything from 5-10 seconds to 10 minutes), regardless of whether any “stuttering” has manifested, the external monitor will stop displaying any further updates. In my case, the image seems “frozen”, and the only thing I can do is to unplug and re-plug the HDMI cable to be able to use it again.

It should be noted that I’ve tried everything I could think of, including using the “same-ish” refresh rates on both the built-in laptop display and the external monitor. Anything and everything from 60Hz (built-in)/59.9Hz (external monitor) to 144.1Hz (built-in)/144Hz (external), with and without G-Sync, VSync, double/triple buffering, etc.

I have used nvidia driver versions from 510 up to 535, too. I have used Linux kernels from the 5.4 and 5.15 series, every single one of them that gets released through Mint/Ubuntu, always “stable”.

I have also used several external displays, including HDTVs and computer display monitors, several HDMI cables (some very expensive), and none of those permutations have resulted in this issue being resolved/avoided.

What I wanted to add to this discussion is that we’ve all (or most of us?) parted with our hard-earned money to buy what was sold to us as top-of-the-line GPUs, and very good Linux support. However, I have to say I am deeply disappointed that NVIDIA does not seem to properly test their drivers before releasing them. After all, PRIME (“offloading”) support has been in this driver for a long time, “gaming” laptops with integrated+discrete GPUs have been sold for a long time, and (some?) people need to work and play with the same hardware, so surely testing that PRIME (“offloading”) is working properly on several “typical” configurations is part of the process? How is it possible that this issue is present on so many different configurations, and yet nvidia keeps releasing drivers without seemingly addressing this problem?

For me, and (I suspect) for many others, having a laptop that can be used “on the go” on battery without having to reboot to switch configurations is an absolute “must”. Using the discrete GPU (without PRIME/“offloading” support) to get rid of this problem is simply not an acceptable solution.

The only (and very much an unacceptable one for me) workaround that I’ve managed to come up with is to switch to a “mirrorred” display configuration (that is, both the internal and external monitors have matching resolutions, regardless of the vertical frequency used on either) before launching a game, and pause the game as soon as the external display becomes “frozen”; then unplug and re-plug the HDMI cable so that the driver can re-initialise/use the external display again, and then carry on playing. Needless to say, this is causing excessive wear on the HDMI connnections (be that the laptop’s or the external monitor’s) and ruining gaming sessions (for example, when playing highly time-sensitive games).

3 Likes

Has anyone from NVIDIA been able to reproduce this issue ?

Unfortunately, I am still not able to recreate issue locally, I tried recent steps of plug/unplug keyboard several times shared by @ursom
I would request to please add Option "ModeDebug" "True" to the device section of xorg.conf and share the nvidia bug report from repro state.

I’m glad to help but my system runs without it by default and I’m not an expert in xorg.conf setup…
I hope the rest participants here can help me with this trivial task :D

So here is what I have in Debian-12:
If I switch to console (while external monitor is plugged-in) stop lightdm (or in safe boot without it), run nvidia-xconfig as root, it generates xorg.conf with only one external monitor, so laptop display is completely unusable
xorg.conf_nvidia-xconfig.txt (1.2 KB)
if i run Xorg -configure instead, it tries to use /root/xorg.conf.new as a template and gives an error

Number of created screens does not match number of detected devices

error.txt (1.1 KB)
If i use content of xorg.conf.new as my xorg.conf - it shows only laptop display, because second one refers nouveau that is blacklisted.
xorg.conf.new.txt (4.9 KB)
if i leave that file and run nvidia-xconfig on it - the tool uses that file as a base and simply replaces both intel and nouveau drivers with nvidia
xorg.conf_nvidia_root.txt (5.6 KB)
This way laptop display doesn’t work and only external monitor does.

Finally, if I take all the best of these experiments and manually craft two screens using intel and nvidia drivers respectively, as desired - result appears to be funny… laptop display works normally and external monitor is black, has no background image and I can only move mouse cursor on it, but I can’t move any windows there. xfce4.18 has a display configuration tool, that is only showing laptop display and no external display. xrandr also detects only laptop display…
xorg.conf_my.txt (1.6 KB)
It feels like it extended one display to two monitors and got confused by something maybe… have no idea… There must be something wrong with it but I’m not an expert in this magic… please help!

If i delete xorg.conf file again, both monitors start working on next reboot… so technically all I need - is to dump current in-memory configuration to xorg.conf file somehow and add that ModeDebug option…

…any hint? anyone?

Right now I can only avoid the freezings by setting the ‘performance mode’ in the PRIME profile. The power-saving mode is one of the several reasons we chose NVIDIA, so using the power-hungry mode activated all the time kind of defeats the purpose (besides the extra cost for the user).

Is there anything that people with distros other than Debian (in my case, Lubuntu) can provide to help with this?

@j22gim , this task is not specific to a distro - if you have /etc/X11/xorg.conf file, then add ModeDebug option to existing Device section that is linked to nvidia driver. This should look like this in my understanding:

    Section "Device"
        ...
        Driver      "nvidia"
        Option      "ModeDebug" "TRUE"
        ...
    EndSection

then reboot, reproduce freeze and run nvidia reporting tool.

PS and since ubuntu is based on debian, it is very likely that you also don’t have xorg.conf file by default, so probably it needs to be generated with nvidia-xconfig tool first in lubuntu as well…

So, looks like there is no need to craft full /etc/X11/xorg.conf configuration, so I removed it and added /etc/X11/xorg.conf.d/devices.conf with only devices described:

Section "Device"
    Identifier     "Device0"
    Driver         "intel"
    VendorName     "Intel Corporation"
    BusID          "PCI:0:2:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
    Option         "ModeDebug" "TRUE"
EndSection

and after reboot I can see some more info in my /var/log/Xorg.0.log, as well as this line:

[     7.263] (**) NVIDIA(G0): Option "ModeDebug" "TRUE"

So seems like it is hooked up and both monitors work normally.
I reproduced glgears freeze this way: placed terminal window on external monitor, ran

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears

and was resizing glgears window on laptop display… froze external monitor and made this report while external monitor is frozen
nvidia-bug-report.log.gz (339.3 KB)

Please keep us updated and let me know if I can do anything else, including fresh OS reinstall or what ever you need, cuz this issue turns laptop into potato :/
Laptop is XOTIC-G70R aka SAGER NP6876 aka Clevo NH70RCQ

1 Like

I had the same issue of external monitor freezing, but installing nvidia-prime, restarting, and setting the GPU to Performance Mode under the PRIME Profiles menu of nvidia-settings seems to have solved it. No freezing now for a solid day.

Using driver 535 on kernel Linux 6.6.0-060600rc5-generic

External monitor connected via HDMI.

I’m using Acer Nitro 5 AN515-58 laptop with the following configuration:

$ inxi -Gxx
Graphics:
  Device-1: Intel Alder Lake-P Integrated Graphics
    vendor: Acer Incorporated ALI driver: i915 v: kernel arch: Gen-12.2 ports:
    active: eDP-1 empty: DP-1,DP-2 bus-ID: 0000:00:02.0 chip-ID: 8086:46a6
  Device-2: NVIDIA GA107M [GeForce RTX 3050 Ti Mobile]
    vendor: Acer Incorporated ALI driver: nvidia v: 545.29.02 arch: Ampere
    bus-ID: 0000:01:00.0 chip-ID: 10de:25a0
  Device-3: Chicony ACER HD User Facing type: USB driver: uvcvideo
    bus-ID: 1-6:4 chip-ID: 04f2:b76f
  Display: x11 server: X.Org v: 1.21.1.7 with: Xwayland v: 22.1.9
    compositor: kwin_x11 driver: X: loaded: modesetting,nvidia dri: iris
    gpu: i915 display-ID: :0 screens: 1
  Screen-1: 0 s-res: 3840x1080 s-dpi: 96
  Monitor-1: HDMI-1-0 pos: primary,left res: 1920x1080 dpi: 82
    diag: 686mm (27.01")
  Monitor-2: eDP-1 pos: right res: 1920x1080 dpi: 142 diag: 394mm (15.53")
  API: OpenGL v: 4.6 Mesa 22.3.6 renderer: Mesa Intel Graphics (ADL GT2)
    direct-render: Yes

OS Version:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm

Kernel version:

$ uname -a
Linux verstak 6.5.0-0.deb12.1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.3-1~bpo12+1 (2023-10-08) x86_64 GNU/Linux

External monitor is connected to the HDMI port.

I’m using Xorg 1.21.1.7 (12101007) with Plasma DE. NVIDIA Driver version is 545.29.02.

When I run

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears

and begin very intensive resizing of glxgears window GUI becomes unresponsive (freezes for short periods of about 1-2 s). It’s interesting, that if I intensively move glxgears it doesn’t produce freezes. Resizing sometimes freezes totally, but laptop display allways still work. I’ve tried to attach gdb to freezed glxgears process and I’ve got following backtrace:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fc65d37d020 in __GI___poll (fds=0x7ffc97fd5698, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: Нет такого файла или каталога.
(gdb) bt
#0  0x00007fc65d37d020 in __GI___poll (fds=0x7ffc97fd5698, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fc65d174d12 in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
#2  0x00007fc65d17716a in xcb_wait_for_special_event () from /lib/x86_64-linux-gnu/libxcb.so.1
#3  0x00007fc65d0b132c in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#4  0x00007fc65d0946ad in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#5  0x00007fc65b8a139e in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.545.29.02
#6  0x00007fc65d0b4bbe in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#7  0x00007fc65d0842d0 in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#8  0x000055888cfffac8 in ?? ()
#9  0x00007fc65d2a81ca in __libc_start_call_main (main=main@entry=0x55888cfff430, argc=argc@entry=1, argv=argv@entry=0x7ffc97fd6488)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#10 0x00007fc65d2a8285 in __libc_start_main_impl (main=0x55888cfff430, argc=1, argv=0x7ffc97fd6488, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffc97fd6478) at ../csu/libc-start.c:360
#11 0x000055888d00015a in ?? ()
(gdb)

__GI___poll is called for fd = 3, which is some kind of socket, possibly used by xcb.

If I switch to text console with Ctrl+Alt+F2 and return back to GUI HDMI output unfreezes and glxgears continues to work.

Freezes are greatly increased if I run vkcube along with glxgears:

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia  vkcube

and begin vkcube window resizing.

I’ve tried:

  • different kernel versions (6.1.x, 6.4.x) and drivers 525.125.06, 525.147.05, 535.129.03 - nothing changes.
  • disabling PCIe runtime power-management - nothing changes;
  • setting PowerMizer mode to the “Prefer Maximum Performance” value - nothing changes;
  • different acpi_osi kernel parameter values - again nothing changes;
  • different i915 kernel module options (f.e. i915.enable_fbc=0, i915.enable_psr=0 and etc) - nothing helps;
  • turning VSYNC on and off - nothing changes;
  • setting “PRIME Synchronization” xrandr option on and off - again nothing.

I’ve tried to check my system load in freezed state with sysprof tool, but there is nothing unusual: no high CPU load in userspace or in kernel space. So I think, that the main cause of display freezes is some kind of broken synchronization between rendering and displaying through discrete GPU.

When HDMI display is freezed nvidia-smi shows zero GPU utilization:

Thu Nov 23 20:05:28 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   45C    P0              11W /  60W |     56MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1363      G   /usr/lib/xorg/Xorg                           47MiB |
|    0   N/A  N/A     17269      G   glxgears                                      2MiB |
+---------------------------------------------------------------------------------------+

by contrast with properly working glxgears:

Thu Nov 23 20:07:02 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   45C    P0              12W /  60W |     56MiB /  4096MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1363      G   /usr/lib/xorg/Xorg                           47MiB |
|    0   N/A  N/A     17269      G   glxgears                                      2MiB |
+---------------------------------------------------------------------------------------+

nvidia-bug-report.log.gz (1.9 MB)

2 Likes