External monitor freezes when using dedicated GPU

I have both on same laptop and Windows doesn’t freeze, while Linux does freeze.

I don’t think, that you issue is related to discussed in this topic one. You flickering is looks like you monitor looses synchronization, cause something went wrong in DisplayPort over Thunderbolt forwarding and this could be causes by source or by destination (dock station):

Generally monitor’s LCD display driver chip just turns off its backlight and/or outputs if there is no valid data/signal on its inputs. So I think this is exactly your case.

If we use Thunderbolt alternative mode (no forwarding, just multiplexed DisplayPort signal over USB-C), then there is no flickering, nor freezing, because Intel video card is responsible for video output rendering and passing data directly to the DisplayPort output.

So, A good point.

However:

  • This was not an issue before 525.85.
  • This happens regardless if the monitors are on DP or in HDMI
  • My machine is not in hybrid mode from BIOS, it’s forcibly using the dedicated GPU.

Thanks for the insights, anyway. :-)

If you are forcibly using dedicated GPU and video output through Thunderbolt connector (it’s proprietary Intel technology), then dedicated GPU simply renders to the Intel’s GPU frame-buffer (Intel GPU rendering is not used) and this frame-buffer contents are passed further to the DP-over-TB engine. So if dedicated GPU rendering process to Intel’s frame buffer is become locked, then it’s possible video output to stuck and external monitor loses synchronization due to no data available.

What is your Linux distributive, Xorg and kernel version? I’m using Debian GNU/Linux 12 (bookworm) with 6.5.0-0.deb12.1-amd64 kernel and Xorg version 21.1.7-3+deb12u2.

I reiterate that the issue was not happening before any driver newer than 525.85.12.

I’ve been having this issue across multiple new driver versions across multiple kernel versions. (Take into account that across those multiple kernel versions, dating back to 6.0 i’ve been patching 525.85.12 to effectively build on >6.1 onwards) So it’s moot to pinpoint a single package combination, however, right now i’m on trixie with 6.6.4 and X11 21.1.9. And i’m using 545.29.06 (native installer, no Debian packages).

Additionally, if i install 525.85.12 in this kernel there’s no flicker, no freezing, no nothing.

I understand clearly the hardware explanation. But take into account that the thunderbolt output (at least on this machine, a Thinkpad X1eGen2) it’s hardwired to the nvidia card. (and i have no Intel VGA controller on the machine’s hardware listing, be it from lspci -v or kernel messages themselves)

Thanks again :-)

Is this yours laptop model - https://laptopmedia.com/review/lenovo-thinkpad-x1-extreme-gen-2-review-an-industrial-yet-simple-mobile-workstation/ ?
Does it use an Intel Core i9-9880H CPU processor?

I’m definitely couldn’t understand how Thunderbolt interface could be directly connected to the dedicated NVIDIA’s GPU. Please look at this NVIDIA’s answer Does NVIDIA's USB-C port support Thunderbolt? | NVIDIA

Sorry i didn’t make myself clear.

The thunderbolt “Display Port” output header is hardwired to the nvidia card. Meaning: you cannot plug another monitor thru this port (using a TB → DP or even a TB->HDMI adapter) and have it being driven by the intel card. The machine has a separate HDMI output which can effectively run another monitor with the intel card.

Could you share your patches for 525.85.12 kernel modules? I’m failed to build them for 6.5.x kernel due to compilation errors. :-(

Sorry as i don’t have the difftool at hand, but i manually patched it from these two:

nvidia-470xx-fix-linux-6.3

535.54 fix

Thank you, I’ll try to make patches. Yesterday NVIDIA released new driver version:


Did you try it?

No, as there’s no flicker fixes and/or DP Alt-modes mentions on that Changelog.

Yes, this release has very poor Changelog, but I hope that Fixed a bug that could cause the system to crash when an application is run with __NV_PRIME_RENDER_OFFLOAD=1 could make a difference in case of freezes while using render offload.

PS: I’ve tried 535.146.02 but nothing has changed for me: freezes are here, no new information/warnings/notes in dmesg or Xorg logs.

1 Like

Have you been able to recreate it ? Or anything that can point to what can cause the issue ?

@roliverio,
I’ve prepared patch file for 525.85.12 driver to be build for 6.5.3-1 - NVIDIA-Linux-x86_64-525.85.12-kernel.patch.gz (2.4 KB)
Note: 525.85.12 driver works with 6.5.3-1 Linux kernel only with ibt=off command line switch. Without this switch I’ve got errors while loading nvidia kernel module:

[   55.676053] traps: Missing ENDBR: _nv012303rm+0x0/0x10 [nvidia]
[   55.676277] ------------[ cut here ]------------
[   55.676278] kernel BUG at arch/x86/kernel/traps.c:255!
[   55.676282] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   55.676296] CPU: 7 PID: 2448 Comm: nvidia-installe Tainted: P     U     OE      6.5.0-0.deb12.1-amd64 #1  Debian 6.5.3-1~bpo12+1
[   55.676310] Hardware name: Acer Nitro AN515-58/Jimny_ADH, BIOS V2.10 07/07/2023
[   55.676319] RIP: 0010:exc_control_protection+0xc2/0xd0

I tried to boot with 525.85.12 driver installed and my external monitor, connected to the NVIDIA’s discrete GPU HDMI output, was frozen before KDE plasma desktop appeared. I suppose this was due to following error in dmesg output:

[   21.946035] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128

Next I’ve tried to run glxgears/vkcube window resizing tests, described above. And nothing have changed, this driver version gives exactly the same freezes/stuttering while I’m resizing vkcube window places over glxgears one.

So I’m moving back to the 535.146.02 driver and waiting for NVIDIA to fix freezes in further driver version.

2 Likes

Here’s my /usr/src/nvidia-current-525.85.12 fully patched to include on dkms.

This worked ok until 6.5.13 (last kernel i patched to work with this driver version)

It seems different to your patch, you might as well try it (take into account i didn’t need ibt=off)

patched-debian-current-dir.tar.gz (45.3 MB)

The only meaningful difference I’ve found so far is following block of code:

Your driver has commented out following block of code:

//#if defined(NV_DRM_HAS_HDR_OUTPUT_METADATA)
//    struct nv_drm_plane_state *nv_drm_plane_state =
//       to_nv_drm_plane_state(state);
//    drm_property_blob_put(nv_drm_plane_state->hdr_output_metadata);
//#endif

Why? I don’t know. And I can’t find the source of such modification.

What generation of CPU are you using? I’ve Gen12 CPU and it has IBT support. But if you are using Gen10 or below CPU, then IBT switch means nothing, cause IBT is not supported by your CPU.

Yep, mine is Gen9.

I can’t recall why i did comment it but this is to allow the HDMI outputs to add the metadata so restricted displays (like TVs) accept HDR, however, my monitors are not HDR, and, i was having some sort of build issue that went away when i commented that. (again, 6.5.13)

I have no such issues with my patches. And I thing that this doesn’t make difference regarding to freezes/stuttering.

Did you try kernel-open kernel driver modules from NVIDIA driver distributive?
You can select them to be installed with the following command:

$ sudo ./NVIDIA-Linux-x86_64-<driver-version>.run -m=kernel-open

Not really, I just want this issue away for good and NVIDIA is not providing any transparency about the progress of this investigation (not that i expect they do give it).

But a lot of people are experiencing similar issues across multiple stacks, kernels, compositors, display handlers and hardware which were not a norm a year ago, take into account that 525.85.12 was released on Jan, and after that, at least for me, everything went berserk.

Just hopeful that they discover what it is, but i know that i’m not buying NVIDIA again. (not that they care, really)

2 Likes

Thanks @dmakc for the troubleshooting efforts!

I wonder if the HW problem hypothesis could be proven one way or another. I tried installing DCGM but the PCIe and memory diagnostics just say “skip” for some reason.