External monitor freezes when using dedicated GPU

Still freezes for me with 565.77 X11 KDE. @amrits what gives?

1 Like

Do you folks have Prime installed? I was having issues where stuff on the dGPU was freezing under load if I had the iGPU enabled and didn’t have Prime installed. Once I installed Prime, the freezes stopped.

What is ā€œPrimeā€? I can’t find such package for Debian/Ubuntu.

It should be included in drivers now. You can run it with nvidia-settings command - it starts gui and you can switch prime profiles there.

Ok, but there are no default profiles:


Where could I get proper profiles and rules to avoid freezes?

No idea, this is what it looks like by default on my new lenovo legion

Now I see, you dont even have a Prime Profiles menu option. To me it doesnt look like its supported on you computer.

I think in Debian/Ubuntu the package is called nvidia-prime.

sudo apt-get install nvidia-prime

There is no such package for Debian, but I’ve tried to install Ubuntu’s one. And nothing changed.

According to the xrandr output I’ve got dGPU:

$ xrandr --listproviders 
Providers: number : 2
Provider 0: id: 0x45 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 4 outputs: 3 associated providers: 1 name:modesetting
Provider 1: id: 0x270 cap: 0x2, Sink Output crtcs: 4 outputs: 1 associated providers: 1 name:NVIDIA-G0

And according to glxinfo I’ve got two different render devices:

$ __GL_SYNC_TO_VBLANK=0 __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia __VK_LAYER_NV_optimus=NVIDIA_only glxinfo |head -n5
name of display: :0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
$ glxinfo |head -n5
name of display: :0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4

So I definitely have two different GPUs in my laptop and PRIME is supported.

I have got ubuntu as well and it came whem I selected proprietary deivers during installation of ubuntu. Maybe thats the difference thatd do the whole setup for you. However you should be able to select that somewhere in update center and enable proprietary and maybe it will update? Dunno. But how do you know prime is supported, do you have a mux switch? For example I do. But Im pretty unsure about that prime itself fixes it anyway, but you can try… Btw I use wayland too.

PRIME doesn’t use mux switch. It uses shared memory for rendering. An I can check that dGPU is used for rendering f.e. through nvidia-smi:

$ nvidia-smi 
Sat Dec 21 12:17:27 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142                Driver Version: 550.142        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   47C    P5              5W /   60W |     302MiB /   4096MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1399      G   /usr/lib/xorg/Xorg                            115MiB |
|    0   N/A  N/A      6880      G   /usr/lib/vmware/bin/mksSandbox                177MiB |
+-----------------------------------------------------------------------------------------+

If I’ve run vmware in PRIME offload mode I’ve got string with /usr/lib/vmware/bin/mksSandbox telling us, that VMware uses NVidia dGPU. What it could be if not PRIME?

Ya, you don’t need a mux for prime. I was having GPU crashes on my desktop with iGPU on the CPU driving 2 monitors and an NV card driving the other 4. Putting the dGPU under stress caused the driver to crash. Installing Prime to mediate the shared resources between the two GPUs solved the crashes for me.

This has come up many times; it’s either not a solution in all cases (i.e. freezes have still been observed) or it’s not a solution because it locks in high power draw on a laptop.

I think nvidia-prime is just a tool for managing profiles; it’s not necessary for PRIME Render Offload on X (Chapter 35. PRIME Render Offload). Wayland is an entirely different story (I don’t think it even uses PRIME?).

I’m also affected by this bug. I’m not using the dedicated GPU; I’m using the On-Demand mode. Using the performance mode causes performance issues with GNOME Shell (under both Wayland and Xorg). Regarding this issue in particular:

  • Host: Lenovo Legion Pro 5 16ARX8
  • Kernel: Linux 6.11.0-13-generic
  • Resolution: 2560 x1600 240 hz
  • Resolution external monitor (HDMI): 1920 x1080, 74.97 hz
  • SO: Ubuntu 24.10
  • DE: GNOME 47
  • WM: X11
  • CPU: AMD Ryzen 9 7945HX with Radeon Graphics Ɨ 32
  • GPU: AMD Radeon 610M
  • GPU: NVIDIA GeForce RTX 4070 Laptop GPU
  • Memory: 32,0 GiB
  • Driver version: 560.35.03

The issue almost instantly occurs when resizing the glxgears window.
Lowering the refresh rate of the external monitor to 60 Hz works for me. So far, it hasn’t failed, even when trying to reproduce the error using vkcube and overlapping glxgears.

Update: No, the bug still occurs even with 60 Hz. It’s harder to reproduce, but I had YouTube on one screen while using Steam with my discrete card on the main screen, and it froze. Also, using Firefox + vkcube + glxgears causes the secondary screen to freeze as well.

i have been facing this issue for over a year now , any fix guys ??? , help a poor soul

1 Like

I’ve made some investigations on 2-second freezing while resizing glxgears using 565.77 open kernel module drivers and this freezes are directly related to the internal timeout - open-gpu-kernel-modules/src/nvidia-modeset/src/nvkms.c at 9d0b0414a5304c3679c5db9d44d2afba8e58cc1b Ā· NVIDIA/open-gpu-kernel-modules Ā· GitHub in IdleBaseChannelAll function:

/*!
 * Idle all requested heads.
 *
 * First, wait for the heads to idle naturally.  If a timeout is exceeded, then
 * force the non-idle heads to idle, and record these in pReply.
 */
static NvBool IdleBaseChannelAll(
    NVDevEvoPtr pDevEvo,
    const struct NvKmsIdleBaseChannelRequest *pRequest,
    struct NvKmsIdleBaseChannelReply *pReply)
{
    NvU64 startTime = 0;

    /*
     * Each element in subDevicesPerHead[] must be large enough to hold one bit
     * per subdevice.
     */
    ct_assert(NVKMS_MAX_SUBDEVICES <=
              (sizeof(pRequest->subDevicesPerHead[0]) * 8));

    /* Loop until all head,sd pairs are idle, or we time out. */
    do {
        const NvU32 timeout = 2000000; /* 2 seconds */


        /*
         * Clear the pReply data,
         * IdleBaseChannelCheckIdle() will fill it afresh.
         */
        nvkms_memset(pReply, 0, sizeof(*pReply));

Matching linux kernel function_graph trace:

# tracer: function_graph
#
# function_graph latency trace v1.1.5 on 6.9.12
# --------------------------------------------------------------------
# latency: 0 us, #3/3, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:20)
#    -----------------
#    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#
#                       _-----=> irqs-off
#                      / _----=> need-resched
#                     | / _---=> hardirq/softirq
#                     || / _--=> preempt-depth
#                     ||| /
# CPU  TASK/PID       ||||     DURATION                  FUNCTION CALLS
# |     |    |        ||||      |   |                     |   |   |   |
  6)   Xorg-1464    |  ..... | $ 2000502 us  |  } /* nvkms_unlocked_ioctl [nvidia_modeset] */
  8)  InputTh-1866  |  ..... | $ 1998402 us  |  } /* nvkms_unlocked_ioctl [nvidia_modeset] */
  7)  nvidia--870   |  ..... | $ 1997807 us  |  } /* nvkms_kthread_q_callback [nvidia_modeset] */

As you could see, Xorg calls nvidia_modeset ioctl through NVidia’s Xorg driver and it freezes for almost 2 seconds.

I’ve changed this value to 10 ms (10000) rebuilt module and glxgears began resizing flawlessly … until my external monitor totally freezed. So I thing that there are two different issues in the NVidia driver: first of them is caused by waiting for rendering queue to become idle and second is undetectable at this moment. Linux kernel function_graph trace subsystem can’t help to detect this issue or may be I don’t know how to setup it properly. :-(

PS: I can’t understand why NVidia’s developers use 2 second timeout in such case. As for me it is almost equivalent to eternity regarding to the monitor frame rate.

2 Likes

For what it’s worth, I’ve discovered a workaround. If I boot into BIOS and change the option to ā€œDedicated Graphicsā€ from ā€œHybrid Graphicsā€, the problem no longer occurs. Obviously, this is far from ideal, as the dGPU never powers down while in this setting and chews up battery, but it’s better than nothing. This is on a Lenovo LOQ 15 with RTX4050 and Ryzen7 8845HS.

I have the same problem on Arch linux. I tried many different driver version / kernel version combinations over two days. Nothing worked.

My workaround at the moment is to go back to proprietary driver version 525.147.05.
No problems at all with this version.

Old version driver is provided in AUR for Arch Linux:
https://aur.archlinux.org/packages?O=0&K=nvidia-525

Installed packages for a working setup:

pacman -Q | grep nvidia
lib32-nvidia-525xx-utils 525.147.05-1
lib32-nvidia-cg-toolkit 3.1-10
lib32-opencl-nvidia 565.77-1
nvidia-525xx-dkms 525.147.05-5
nvidia-525xx-settings 525.147.05-1
nvidia-525xx-utils 525.147.05-5
nvidia-cg-toolkit 3.1-8
nvidia-prime 1.0-5

Newer versions of the driver trigger a freeze of the screen for me in games. The game doesn’t crash. Mouse pointer is still there. I can hear the games sound. But the screen is not refreshed anymore. Happens after a few minutes into many games.

More info on my system, maybe it helps to debug:

Graphics:

inxi -G
Graphics:
  Device-1: NVIDIA AD107M [GeForce RTX 4060 Max-Q / Mobile] driver: nvidia
    v: 525.147.05
  Device-2: Advanced Micro Devices [AMD/ATI] Phoenix3 driver: amdgpu
    v: kernel
  Device-3: Kingcome FHD WebCam driver: uvcvideo type: USB
  Display: x11 server: X.Org v: 21.1.15 with: Xwayland v: 24.1.4 driver: X:
    loaded: modesetting unloaded: vesa dri: radeonsi gpu: amdgpu resolution:
    1: 3840x2160~60Hz 2: 2560x1600~240Hz
  API: EGL v: 1.5 drivers: kms_swrast,nvidia,radeonsi,swrast
    platforms: gbm,x11,surfaceless,device
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: amd mesa v: 24.3.3-arch1.2
    renderer: AMD Radeon Graphics (radeonsi gfx1103_r1 LLVM 19.1.6 DRM 3.59
    6.12.9-arch1-1)
  API: Vulkan v: 1.4.303 drivers: N/A surfaces: xcb,xlib
  Info: Tools: api: eglinfo, glxinfo, vulkaninfo
    de: kscreen-doctor,xfce4-display-settings gpu: corectrl, gputop,
    intel_gpu_top, lsgpu, nvidia-settings, nvidia-smi x11: xdriinfo,
    xdpyinfo, xprop, xrandr

Kernel:

uname -a
Linux ww 6.12.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 10 Jan 2025 00:39:41 +0000 x86_64 GNU/Linux

Nvidia SMI:

nvidia-smi 
Sat Jan 18 02:13:20 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   39C    P0    N/A / 115W |      1MiB /  8188MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Would be nice to see a fix. Hard to believe this bug has been around for such a long time.

And i forgot in the last post: These are my screens (DP-1 is external, eDP-1 is notebook screen):

xrandr | grep connected
eDP-1 connected (normal left inverted right x axis y axis)
DP-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 698mm x 393mm