Fd leak with explicit sync and kde plasma

Every notification, opening/closing plasmoids cause a lot of sync_file leaks in plasmashell:

❯ lsof -p $(pidof plasmashell)

396r a_inode 0,16 0 1062 sync_file
397r a_inode 0,16 0 1062 sync_file
399r a_inode 0,16 0 1062 sync_file
400r a_inode 0,16 0 1062 sync_file

And plasmashell eventually crashes with:
plasmashell[2053]: error marshalling arguments for get_icon: dup failed: Too many open files
plasmashell[2053]: Error marshalling request: Too many open files
plasmashell[2053]: The Wayland connection experienced a fatal error: Too many open files
plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION

If I set __NV_DISABLE_EXPLICIT_SYNC=1 in /etc/environment this doesn’t happen.

https://bugs.kde.org/show_bug.cgi?id=497424

5 Likes

If there are specific logs that would be obtainable or troubleshooting actions that would be helpful for folks to dig into this one, I’d be happy to provide. I’m attaching a bug report script output here as well.

Thanks,

nvidia-bug-report.log.gz (1.4 MB)

Still happens in 570.86.16.

I now also wonder who’s bug it is actually. KDE or NVidia

Mmm… it seems like they didn’t even take a look and just shoved it over to Nvidia. But who knows…

SemiOT: There’s a also bug report about plasmashells RAM usage that’s been active for years and no one really did anything about it.

1 Like

Possibly related to the Hyprland + Nvidia Vram leak?

I’d say it’s possible the fd leak happens on Hyprland but I don’t know, I don’t run it so I can’t check.

Should be fairly easy to check tho’
lsof -p $(pidof Hyprland)

If you see a lot of sync_file entries when VRAM is high, it’s probably the same issue as this one. But you have to compare it to when Hyprland was just freshly started.

Update:
Or try to set __NV_DISABLE_EXPLICIT_SYNC=1 and see what happens to the VRAM.

i see 5 sync_files. is this a normal amount?

Probably, yes, compare it to this ;) :

❯ lsof -p $(pidof plasmashell)|grep sync_file|count
88

but this time plasmashells VRAM usage wasn’t that high, "only"126Mb which’s quite normal

Btw, an easy way to reproduce this, run notify-send test a few times and see the number of sync files increase.

❯ lsof -p $(pidof plasmashell)|grep sync_file|count
351

I don’t think it affects VRAM usage that much tho’.

So any sort of idea what a problem could be? Who’s fault is that? (KDE/NVidia driver)

I can trivially reproduce this as well, by sending notifications with notify-send . I am on 570.86.16 and KDE Plasma 6.3. The Upstream KDE issue here: 497424 – fd leak with explicit sync (nvidia) , which was linked in the OP, claims this is an Nvidia driver bug.

Please let me know if any other information would be useful

[Cross posted from 497424 – fd leak with explicit sync (nvidia)]:

I also experience regular plasmashell crashes due to my notification-heavy workflow and leaking file descriptors.

Here’s a script that easily trigger this crash:

#!/bin/bash

NOTIFICATIONS=0
PREV_PID=""

if [[ "$__NV_DISABLE_EXPLICIT_SYNC" == "1" ]]; then
	echo "Explicit sync is disabled. Descriptors shouldn't leak."
else
	echo "Explicit sync is enabled. Descriptors should leak."
fi


while true; do
	PID=$(pidof plasmashell)

	# Check if PID has changed since the last execution. This is an indication that plasmashell has crashed.
	if [[ $PREV_PID != "" && "$PID" != "$PREV_PID" ]]; then
		echo "plasmashell crashed after $NOTIFICATIONS notifications"
		echo ""
		journalctl --no-pager --lines=100 | grep -C 10 "Too many open files"
		exit 1
	fi
	PREV_PID="$PID"

	if (( NOTIFICATIONS % 40 == 0 )); then
		echo ""
		echo "Notification    PID  Limit  Open descriptors  Until limit"
		echo "------------  -----  -----  ----------------  -----------"
	fi
	((NOTIFICATIONS++))

	kdialog --title "FD Leak" --passivepopup "Notification number $NOTIFICATIONS" 1 &

	# notify-send also leaks descriptors; either method will work.
	# notify-send "FD Leak" "Notification number $NOTIFICATIONS" --expire-time 1000

	# Get the descriptors and process's open files limit.
	FD_COUNT=$(ls -la /proc/$PID/fd 2>/dev/null | wc -l)
	LIMIT=$(cat /proc/$PID/limits | grep "Max open files" | awk '{print $4}')
	REMAINING=$(($LIMIT - $FD_COUNT))

	printf "%12d %6d %6d %17d %12d\n" $NOTIFICATIONS $PID $LIMIT $FD_COUNT $REMAINING

	# The notification rate isn't related to triggering a crash.
	sleep 0.25
done

This is a partial output from the script:

[12:43:20 user@fedroa-pc:[~]> ./leak.sh
Explicit sync is enabled. Descriptors should leak.

Notification    PID  Limit  Open descriptors  Until limit
------------  -----  -----  ----------------  -----------
           1   2563   1024               157          867
           2   2563   1024               157          867
[snip]
         218   2563   1024              1017            7
         219   2563   1024              1024            0
plasmashell crashed after 219 notifications

Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db036bdf90
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db036bdf90
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03438840
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03438840
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: error marshalling arguments for import_timeline: dup failed: Too many open files
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: Error marshalling request: Too many open files
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03f04800
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03f04800
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: The Wayland connection experienced a fatal error: Too many open files
Mar 01 12:45:14 fedroa-pc systemd[1983]: Starting grub-boot-success.service - Mark boot as successful...
Mar 01 12:45:14 fedroa-pc systemd[1983]: Finished grub-boot-success.service - Mark boot as successful.
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Failed with result 'exit-code'.
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Consumed 23.288s CPU time, 331.5M memory peak.

System info:

Operating System: Fedora Linux 41
KDE Plasma Version: 6.3.2
KDE Frameworks Version: 6.11.0
Qt Version: 6.8.2
Kernel Version: 6.13.5-200.fc41.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 16 × 11th Gen Intel® Core™ i7-11850H @ 2.50GHz
Memory: 62.6 GiB of RAM
Graphics Processor: NVIDIA RTX A3000 Laptop GPU/PCIe/SSE2
NVIDIA Driver Version: 570.124.04

I got an honest plasmashell crash now after ~2 days of normal system usage. It’s easy to miss because plasmashell just restarts.

mar 13 12:03:38 auros plasmashell[1189]: kpipewire_logging: PipeWire remote error:  -71 connection error
mar 13 12:03:39 auros plasmashell[1189]: error marshalling arguments for import_timeline: dup failed: Too many open files
mar 13 12:03:39 auros plasmashell[1189]: Error marshalling request: Too many open files
mar 13 12:03:39 auros plasmashell[1189]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x5599cfc0b6c0
mar 13 12:03:39 auros plasmashell[1189]: The Wayland connection experienced a fatal error: Too many open files
mar 13 12:03:39 auros xdg-document-portal[1071]: removing transfer 15462174045790544431 for dead peer :1.26
mar 13 12:03:39 auros systemd[969]: plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION
mar 13 12:03:39 auros systemd[969]: plasma-plasmashell.service: Failed with result 'exit-code'.
mar 13 12:03:39 auros systemd[969]: plasma-plasmashell.service: Consumed 4min 58.147s CPU time, 423.4M memory peak, 283.5M memory swa>
mar 13 12:03:39 auros systemd[969]: plasma-plasmashell.service: Scheduled restart job, restart counter is at 1.
mar 13 12:03:39 auros systemd[969]: Starting KDE Plasma Workspace...
mar 13 12:03:39 auros systemd[969]: Started KDE Plasma Workspace.

NVIDIA Driver Version: 570.124.04

Yeah. That crashes happen more often in fact. I will put my 5 cents here. Here is report and trace to crash: https://bugs.kde.org/show_bug.cgi?id=500351 . If NVidia somehow related to dbus then maybe they can fix it. It’s actually complicated to figure out crash point, due to driver being a black box blob.

1 Like

FYI, there are no lingering sync_file with amdgpu/RADV, I’m not sure if this whole sync_file is purely an Nvidia thing.

❯ lsof -p $(pidof plasmashell)|grep sync_file|count
681

Just created an account to report here as well.

570.144 proprietary driver on 4070 non-super non-ti, if it matters - two displays with VRR enabled. Been having plasma crashes due to sync_file spam ever since I’ve switched to wayland. On Fedora 41 crashes were a bit more graceful with plasma just re-loading itself, but on F42 it became messy and tending more to freezes and outright crashes. Have to manually restart plasma when it inevitably happens.

Increasing the open file limit only delays the issue. No open bugs on KDE or Fedora side as far as I can tell.