[BUG] linux driver fails to remove framebuffer device when HDMI cable plugged out

My laptop setup is that a Nvidia 4070 DGPU is wired directly to the HDMI port on the laptop. (I understand this is a common setup.)

When the cable is plugged in a new framebuffer device is created as it should, however when the cable is plugged out, the device is NOT removed even with no clients using it. This has several negative consequences:

  • If the virtual console is remapped to the new framebuffer, then after plugging out, the console is NOT remapped back to the integrated GPU. (This can be inhibited by passing fbcon=map:0, however this does not help the framebuffer to get removed)
  • The DGPU device fails to enter D3cold state and consumes power.

Here are some facts from the kernel’s sysfs. Note this is WITHOUT any graphical environment running, pure text console, ruling out the graphical env as a culprit.

Before the cable is plugged in

+ cat /proc/driver/nvidia/gpus/0000:c4:00.0/power
Runtime D3 status:          Enabled (fine-grained)
Video Memory:               Off

GPU Hardware Support:
 Video Memory Self Refresh: Supported
 Video Memory Off:          Supported

S0ix Power Management:
 Platform Support:          Supported
 Status:                    Disabled
+ cat /sys/kernel/debug/dri/1/clients
             command  tgid dev master a   uid      magic
+ cat /sys/kernel/debug/dri/1/internal_clients
fbdev
+ ls /sys/class/graphics
fb0
fbcon
+ grep . /sys/class/graphics/fb0/name
amdgpudrmfb

After plugged in

[  319.616356] nvidia 0000:c4:00.0: [drm] fb1: nvidia-drmdrmfb frame buffer device

appears in kernel log

+ cat /proc/driver/nvidia/gpus/0000:c4:00.0/power
Runtime D3 status:          Enabled (fine-grained)
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Supported
 Video Memory Off:          Supported

S0ix Power Management:
 Platform Support:          Supported
 Status:                    Disabled
+ cat /sys/kernel/debug/dri/1/clients
             command  tgid dev master a   uid      magic
+ cat /sys/kernel/debug/dri/1/internal_clients
fbdev
+ ls /sys/class/graphics
fb0
fb1
fbcon
+ grep . /sys/class/graphics/fb0/name /sys/class/graphics/fb1/name
/sys/class/graphics/fb0/name:amdgpudrmfb
/sys/class/graphics/fb1/name:nvidia-drmdrmfb

After plugged out, no message in kernel log, framebuffer device still active.

+ cat /proc/driver/nvidia/gpus/0000:c4:00.0/power
Runtime D3 status:          Enabled (fine-grained)
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Supported
 Video Memory Off:          Supported

S0ix Power Management:
 Platform Support:          Supported
 Status:                    Disabled
+ cat /sys/kernel/debug/dri/1/clients
             command  tgid dev master a   uid      magic
+ cat /sys/kernel/debug/dri/1/internal_clients
fbdev
+ ls /sys/class/graphics
fb0
fb1
fbcon
+ grep . /sys/class/graphics/fb0/name /sys/class/graphics/fb1/name
/sys/class/graphics/fb0/name:amdgpudrmfb
/sys/class/graphics/fb1/name:nvidia-drmdrmfb
2 Likes

I have the exact same issue with intel12 and nvidia 3080 ti on a advanced optimus laptop.

I first thought it was an issue from Aquamarine (AQ), the Wayland backend used by Hyprland, but i can reproduce the issue without a graphical environment as well.


  # enable finegrained power management
  boot.extraModprobeConfig = ''
    options nvidia NVreg_EnableGpuFirmware=0 NVreg_DynamicPowerManagementVideoMemoryThreshold=0 NVreg_DynamicPowerManagement=0x02 NVreg_UsePageAttributeTable=1 NVreg_InitializeSystemMemoryAllocations=0 NVreg_PreserveVideoMemoryAllocations=1
    options nvidia-drm modeset=1 fbdev=1
  '';

  boot.kernelParams = ["nvidia-drm.modeset=1" "nvidia.NVreg_EnableGpuFirmware=0" "nvidia.NVreg_DynamicPowerManagementVideoMemoryThreshold=0" "nvidia.NVreg_DynamicPowerManagement=0x02" "nvidia.NVreg_UsePageAttributeTable=1" "nvidia.NVreg_InitializeSystemMemoryAllocations=0" "nvidia.NVreg_PreserveVideoMemoryAllocations=1"];

Maybe it is possible to manipulate the nvidia driver from the udev rules as a workaround

The issue also happen on the open driver

After a clean boot sudo cat /sys/kernel/debug/dri/129/framebuffer is empty
After plugging the external monitor, it contains

framebuffer[115]:
	allocated by = [fbcon]
	refcount=1
	format=XR24 little-endian (0x34325258)
	modifier=0x0
	size=1920x1080
	layers:
		size[0]=1920x1080
		pitch[0]=7680
		offset[0]=0
		obj[0]:
			name=0
			refcount=2
			start=00100000
			size=8294400
			imported=no

along with my wm processes framebuffers.

After unplugging, the other framebuffers are cleaned up, but fbcon allocation remains

Maybe it is related to

EDIT: issue created at RTD3 dont allow gpu to sleep after a monitor has been plugged and unplugged on prime reverse sync · Issue #759 · NVIDIA/open-gpu-kernel-modules · GitHub

1 Like

Hi @gm151, @Aetherall ,

Thank you for reporting the issue. I have filed NVBug # 5034343 internally.

  1. Can you please capture a NVIDIA bug report after you run into this issue. We will try and match your distro, kernel and driver versions for testing.
  2. Please share your steps to reproduce the issue.

Hello,

I’ve tried to reproduce the issue: boot to the desktop, check that GPU goes into D3cold, plug in a display (HDMI routed to the nvidia GPU), unplug, observe unwanted consequences.
For me, the GPU does not enter PCI D3cold, I also see that the system still thinks the display is still plugged in in the gnome display settings and my cursor will go into the void where the display pane was.

This is Fedora (Silverblue) 41, nvidia 565.77, linux 6.12.7, Wayland on a ThinkPad P1 gen 6 laptop.

Attached is the nvidia-bug-report from the repro state.
nvidia-bug-report.log.gz (341.7 KB)

1 Like

Hi @ievgenp ,

Thank you for the logs. GPU not entering D3cold after unplugging the external display is a known issue. We are tracking it internally on NVBug #4822713.

The issue remains under investigation and we do not have a fix at this time.