if laptop lid is closed during suspend, "failed to set mode: No space left on device" upon resume

References to the issue:

Problem summary:
if laptop lid is closed after suspend is initiated (via keyboard
shortcut), but before laptop actually is suspended, then upon resume multiple
things happen

  • (seems to be 100% reproducible): gdm crashes with Xorg.log.1 having “(EE) modeset(G0): failed to set mode: No space left on device”.

  • (50%): some fonts get screwed up (sorry - no screenshot)

Laptop suspends/resumes (and even hibernates to encrypted swap partition) just fine otherwise. I just need to fully suspend before closing the lid.

Details:
I have thinkpad P1 with Intel onboard and NVIDIA Quadro T2000, running Debian testing/sid with nvidia drivers (440.31) provided via debian experimental.

$> lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
01:00.0 VGA compatible controller: NVIDIA Corporation TU117GLM [Quadro T2000 Mobile / Max-Q] (rev a1)

From what I have heard, external output (HDMI) is wired to Nvidia GPU and Nvidia drivers do not support screen offloading, so I just followed instructions to make Nvidia the GPU to be the GPU to do all the work:

$> cat /etc/X11/xorg.conf.d/nvidia.conf 
# try to configure display output source provider via nvidia since
# otherwise cannot display to external HDMI
#
# https://download.nvidia.com/XFree86/Linux-x86/375.82/README/randr14.html
#
# ....
Section "ServerLayout"
    Identifier "layout"
    Screen 0 "nvidia"
    Inactive "intel"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:1:0:0"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration"
EndSection

Section "Device"
    Identifier "intel"
    Driver "modesetting"
EndSection

Section "Screen"
    Identifier "intel"
    Device "intel"
EndSection

with those ad-hoc invocations for gdm and x-session to use it as the provider

$> cat /usr/local/bin/nvidia-hdmi
#!/bin/sh

set -eu

# Nvidia on lena madness
touch /tmp/log
echo "running" >> /tmp/log
xrandr --setprovideroutputsource modesetting NVIDIA-0 >> /tmp/log 2>&1
xrandr --auto >> /tmp/log 2>&1

So far it was good but I noted that the laptop heats up even when suspended, so I also followed https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/powermanagement.html and configured power management.

http://www.onerussian.com/tmp/x-crash.20191120/ contains log dumps by journalctl. Excerpt of most notable diff between normal suspend (lid is not closed, would be marked with -) and problematic (closing before suspend finished, marked with +) is probably:

$> sedme() { sed -e 's,Nov 20 ..:..:.. ,<DATE> ,g' -e 's,\[[0-9].........\.....],[<TIME>],g' -e 's,\[[0-9]*\],[<PID>],g' -e 's,^> ,,g' "$1" }
	$> diff -Naur <(sedme 0-1-journalctl.diff) <(sedme 1-2-journalctl.diff) | less 
	...
	 <DATE> lena acpid[<PID>]: 1 client rule loaded
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (EE) modeset(G0): failed to set mode: No space left on device
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (EE)
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: Fatal server error:
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (EE) EnterVT failed for gpu screen 0
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (EE)
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: Please consult the The X.Org Foundation support
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]:          at http://wiki.x.org
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]:  for help.
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (EE) Please also check the log file at "/var/log/Xorg.1.log" for additional information.
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (EE)
	+<DATE> lena systemd[<PID>]: Starting Refresh fwupd metadata and update motd...
	+<DATE> lena fwupdmgr[<PID>]: Fetching metadata https://cdn.fwupd.org/downloads/firmware.xml.gz
	 <DATE> lena systemd-logind[<PID>]: Operation 'sleep' finished.
	...
	 <DATE> lena avahi-daemon[<PID>]: Leaving mDNS multicast group on interface wlp82s0.IPv4 with address 10.31.123.112.
	-<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (II) event1  - Sleep Button: is tagged by udev as: Keyboard
	-<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (II) event1  - Sleep Button: device is a keyboard
	+<DATE> lena /usr/lib/gdm3/gdm-x-session[<PID>]: (EE) Server terminated with error (1). Closing log file.

Not sure how crash message about out of space is relevant since

$> grep Path /proc/driver/nvidia/params
TemporaryFilePath: "/var/tmp"

$> df -h /var/tmp
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p5  366G   30G  318G   9% /

and the only smallish partition is

$> df -h /run
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.2G  2.2M  3.2G   1% /run

but I increased it to 8GB with no positive effect on the issue.

Please assist in digging out what could be causing this

Please post the output of
cat /sys/power/mem_sleep

NB laptop was rebooted but afaik nothing was changed/upgraded.

$> cat /sys/power/mem_sleep 
s2idle [deep]

Looks correct. The notebook heating up while suspended doesn’t sound good, maybe the nvidia gpu isn’t powered down correctly. Please check for a bios upgrade.

heating up - I wonder if that is just not sensors glitch, but I guess it shouldn’t be relevant for this particular issue.

BIOS/firmware was indeed a few minor revisions outdated, updated now.

$> cat 0-journalctl-boot.notes
* Firmware was updated and laptop rebooted:

$> sudo fwupdmgr get-updates
No upgrades for System Firmware, current is 0.1.27: 0.1.27=same, 0.1.27=same, 0.1.23=older, 0.1.23=older
No upgrades for UEFI Device Firmware, current is 0.1.19: 0.1.19=same
________________________________________________

Devices that have been updated successfully:

 • System Firmware (0.1.23 → 0.1.27)
 • UEFI Device Firmware (0.1.18 → 0.1.19)

Didn’t change anything regarding suspend/lid issue. Here is updated list of logs and diffs: http://onerussian.com/tmp/x-crash.20191124-1

Any clues on what “space”/device the message " modeset(G0): failed to set mode: No space left on device" talks about, or how to

I would accept any brave new ideas to try – this issue is really annoying since I keep killing my environments while forgetting about it and closing lid “too early”.

ok, regarding “out of cheese” error. I have strace’ed running Xorg to get

211693 connect(53, {sa_family=AF_UNIX, sun_path="/var/run/acpid.socket"}, 23) = 0
211693 epoll_ctl(4, EPOLL_CTL_ADD, 53, {0, {u32=2577581184, u64=94513332929664}}) = 0
211693 epoll_ctl(4, EPOLL_CTL_MOD, 53, {EPOLLIN, {u32=2577581184, u64=94513332929664}}) = 0
211693 openat(AT_FDCWD, "/proc/acpi/video", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
211693 openat(AT_FDCWD, "/sys/devices/platform/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 54
211693 fstat(54, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
211693 getdents64(54, /* 32 entries */, 32768) = 1048
211693 getdents64(54, /* 0 entries */, 32768) = 0
211693 close(54)                        = 0
211693 ioctl(14, DRM_IOCTL_MODE_SETGAMMA, 0x7ffcc00cd5d0) = 0
211693 ioctl(14, DRM_IOCTL_MODE_ADDFB2, 0x7ffcc00cd3d0) = 0
211693 ioctl(14, DRM_IOCTL_MODE_SETCRTC, 0x7ffcc00cd4c0) = -1 ENOSPC (No space left on device)
211693 write(2, "(EE) modeset(G0): failed to set "..., 62) = 62
211693 write(5, "[ 35100.854] ", 13)    = 13
211693 write(5, "(EE) modeset(G0): failed to set "..., 62) = 62
211693 write(2, "(EE)", 4)              = 4
211693 write(5, "[ 35100.856] ", 13)    = 13
211693 write(5, "(EE)", 4)              = 4
211693 write(2, " ", 1)                 = 1
211693 write(5, " ", 1)                 = 1

and the fd 14 for that process (before starting to trace) was pointing to /dev/dri/card0 .

The modesetting driver is trying to allocate a framebuffer on the igpu but the i915 driver fails to do so saying ENOSPC. Doesn’t really help, though since the reason is unknown. Might even be a bug in Gnome’s monitor manager.

d’oh – right! that card0 is the intel one. I believe there was an alternative driver to modesetting I could use for it. I will try to dig something there later on and report back. Thanks!

I have a lenovo P1 gen2 and had this exact same issue. However, with the nvidia 440.48 drivers (from https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa) and linux kernel 5.3.15 it seems to be fixed.

I have still seen some strange behaviors where the fonts go wonky but that has happened maybe 1/50 sleeps.

I recently updated again to kernel 5.4.15 and it still seems to be working fine.

yarikoptic you should see if the new drivers and kernel solve the issue for you!

Thank you Ben for sharing! I guess I will need to wait a bit, debian experimental still has only 440.44-2 . I am already on 5.5.0-rc5

Hello again Yarikoptic!

Something updated recently (again) and now I am back to seeing X crash every time I close my lid :(

I wonder if nvidia devs know about this issue

Also seeing the same thing as Ben, same laptop, etc.

Seems like a regression.

It seems the bug disappears if the screen refresh rate is set to 60 Hz instead of 59.xx (probably set when using an external monitor). So there is something related to the screen parameters that I don’t fully get, but being careful of setting the refresh back to 60 Hz when using external screens is at least a reasonable workaround.