Fd leak with explicit sync and kde plasma

daron439 · December 18, 2024, 9:30am

Every notification, opening/closing plasmoids cause a lot of sync_file leaks in plasmashell:

❯ lsof -p $(pidof plasmashell)
…
396r a_inode 0,16 0 1062 sync_file
397r a_inode 0,16 0 1062 sync_file
399r a_inode 0,16 0 1062 sync_file
400r a_inode 0,16 0 1062 sync_file

And plasmashell eventually crashes with:
plasmashell[2053]: error marshalling arguments for get_icon: dup failed: Too many open files
plasmashell[2053]: Error marshalling request: Too many open files
plasmashell[2053]: The Wayland connection experienced a fatal error: Too many open files
plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION

If I set __NV_DISABLE_EXPLICIT_SYNC=1 in /etc/environment this doesn’t happen.

https://bugs.kde.org/show_bug.cgi?id=497424

john.kizer · January 23, 2025, 3:59am

If there are specific logs that would be obtainable or troubleshooting actions that would be helpful for folks to dig into this one, I’d be happy to provide. I’m attaching a bug report script output here as well.

Thanks,

nvidia-bug-report.log.gz (1.4 MB)

daron439 · January 31, 2025, 8:18am

Still happens in 570.86.16.

phoenix91140 · January 31, 2025, 8:31am

I now also wonder who’s bug it is actually. KDE or NVidia

shelter · January 31, 2025, 8:54am

Mmm… it seems like they didn’t even take a look and just shoved it over to Nvidia. But who knows…

SemiOT: There’s a also bug report about plasmashells RAM usage that’s been active for years and no one really did anything about it.

faz · February 4, 2025, 10:00am

Possibly related to the Hyprland + Nvidia Vram leak?

shelter · February 4, 2025, 11:40am

I’d say it’s possible the fd leak happens on Hyprland but I don’t know, I don’t run it so I can’t check.

Should be fairly easy to check tho’
lsof -p $(pidof Hyprland)

If you see a lot of sync_file entries when VRAM is high, it’s probably the same issue as this one. But you have to compare it to when Hyprland was just freshly started.

Update:
Or try to set __NV_DISABLE_EXPLICIT_SYNC=1 and see what happens to the VRAM.

faz · February 4, 2025, 8:03pm

i see 5 sync_files. is this a normal amount?

shelter · February 4, 2025, 8:59pm

Probably, yes, compare it to this ;) :

❯ lsof -p $(pidof plasmashell)|grep sync_file|count
88

but this time plasmashells VRAM usage wasn’t that high, "only"126Mb which’s quite normal

shelter · February 5, 2025, 1:52pm

Btw, an easy way to reproduce this, run notify-send test a few times and see the number of sync files increase.

❯ lsof -p $(pidof plasmashell)|grep sync_file|count
351

I don’t think it affects VRAM usage that much tho’.

phoenix91140 · February 18, 2025, 9:31pm

So any sort of idea what a problem could be? Who’s fault is that? (KDE/NVidia driver)

slothrop21 · February 18, 2025, 10:15pm

I can trivially reproduce this as well, by sending notifications with notify-send . I am on 570.86.16 and KDE Plasma 6.3. The Upstream KDE issue here: 497424 – fd leak with explicit sync (nvidia) , which was linked in the OP, claims this is an Nvidia driver bug.

Please let me know if any other information would be useful

nvsteve25 · March 1, 2025, 8:13pm

[Cross posted from 497424 – fd leak with explicit sync (nvidia)]:

I also experience regular plasmashell crashes due to my notification-heavy workflow and leaking file descriptors.

Here’s a script that easily trigger this crash:

#!/bin/bash

NOTIFICATIONS=0
PREV_PID=""

if [[ "$__NV_DISABLE_EXPLICIT_SYNC" == "1" ]]; then
	echo "Explicit sync is disabled. Descriptors shouldn't leak."
else
	echo "Explicit sync is enabled. Descriptors should leak."
fi


while true; do
	PID=$(pidof plasmashell)

	# Check if PID has changed since the last execution. This is an indication that plasmashell has crashed.
	if [[ $PREV_PID != "" && "$PID" != "$PREV_PID" ]]; then
		echo "plasmashell crashed after $NOTIFICATIONS notifications"
		echo ""
		journalctl --no-pager --lines=100 | grep -C 10 "Too many open files"
		exit 1
	fi
	PREV_PID="$PID"

	if (( NOTIFICATIONS % 40 == 0 )); then
		echo ""
		echo "Notification    PID  Limit  Open descriptors  Until limit"
		echo "------------  -----  -----  ----------------  -----------"
	fi
	((NOTIFICATIONS++))

	kdialog --title "FD Leak" --passivepopup "Notification number $NOTIFICATIONS" 1 &

	# notify-send also leaks descriptors; either method will work.
	# notify-send "FD Leak" "Notification number $NOTIFICATIONS" --expire-time 1000

	# Get the descriptors and process's open files limit.
	FD_COUNT=$(ls -la /proc/$PID/fd 2>/dev/null | wc -l)
	LIMIT=$(cat /proc/$PID/limits | grep "Max open files" | awk '{print $4}')
	REMAINING=$(($LIMIT - $FD_COUNT))

	printf "%12d %6d %6d %17d %12d\n" $NOTIFICATIONS $PID $LIMIT $FD_COUNT $REMAINING

	# The notification rate isn't related to triggering a crash.
	sleep 0.25
done

This is a partial output from the script:

[12:43:20 user@fedroa-pc:[~]> ./leak.sh
Explicit sync is enabled. Descriptors should leak.

Notification    PID  Limit  Open descriptors  Until limit
------------  -----  -----  ----------------  -----------
           1   2563   1024               157          867
           2   2563   1024               157          867
[snip]
         218   2563   1024              1017            7
         219   2563   1024              1024            0
plasmashell crashed after 219 notifications

Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db036bdf90
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db036bdf90
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03438840
Mar 01 12:45:13 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03438840
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: error marshalling arguments for import_timeline: dup failed: Too many open files
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: Error marshalling request: Too many open files
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03f04800
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x55db03f04800
Mar 01 12:45:14 fedroa-pc plasmashell[2563]: The Wayland connection experienced a fatal error: Too many open files
Mar 01 12:45:14 fedroa-pc systemd[1983]: Starting grub-boot-success.service - Mark boot as successful...
Mar 01 12:45:14 fedroa-pc systemd[1983]: Finished grub-boot-success.service - Mark boot as successful.
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Failed with result 'exit-code'.
Mar 01 12:45:14 fedroa-pc systemd[1983]: plasma-plasmashell.service: Consumed 23.288s CPU time, 331.5M memory peak.

System info:

Operating System: Fedora Linux 41
KDE Plasma Version: 6.3.2
KDE Frameworks Version: 6.11.0
Qt Version: 6.8.2
Kernel Version: 6.13.5-200.fc41.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 16 × 11th Gen Intel® Core™ i7-11850H @ 2.50GHz
Memory: 62.6 GiB of RAM
Graphics Processor: NVIDIA RTX A3000 Laptop GPU/PCIe/SSE2
NVIDIA Driver Version: 570.124.04

shelter · March 13, 2025, 11:08am

I got an honest plasmashell crash now after ~2 days of normal system usage. It’s easy to miss because plasmashell just restarts.

mar 13 12:03:38 auros plasmashell[1189]: kpipewire_logging: PipeWire remote error:  -71 connection error
mar 13 12:03:39 auros plasmashell[1189]: error marshalling arguments for import_timeline: dup failed: Too many open files
mar 13 12:03:39 auros plasmashell[1189]: Error marshalling request: Too many open files
mar 13 12:03:39 auros plasmashell[1189]: qt.qpa.wayland: eglSwapBuffers failed with 0x3000, surface: 0x5599cfc0b6c0
mar 13 12:03:39 auros plasmashell[1189]: The Wayland connection experienced a fatal error: Too many open files
mar 13 12:03:39 auros xdg-document-portal[1071]: removing transfer 15462174045790544431 for dead peer :1.26
mar 13 12:03:39 auros systemd[969]: plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION
mar 13 12:03:39 auros systemd[969]: plasma-plasmashell.service: Failed with result 'exit-code'.
mar 13 12:03:39 auros systemd[969]: plasma-plasmashell.service: Consumed 4min 58.147s CPU time, 423.4M memory peak, 283.5M memory swa>
mar 13 12:03:39 auros systemd[969]: plasma-plasmashell.service: Scheduled restart job, restart counter is at 1.
mar 13 12:03:39 auros systemd[969]: Starting KDE Plasma Workspace...
mar 13 12:03:39 auros systemd[969]: Started KDE Plasma Workspace.

NVIDIA Driver Version: 570.124.04

phoenix91140 · March 18, 2025, 7:06am

Yeah. That crashes happen more often in fact. I will put my 5 cents here. Here is report and trace to crash: https://bugs.kde.org/show_bug.cgi?id=500351 . If NVidia somehow related to dbus then maybe they can fix it. It’s actually complicated to figure out crash point, due to driver being a black box blob.

shelter · April 1, 2025, 2:48am

FYI, there are no lingering sync_file with amdgpu/RADV, I’m not sure if this whole sync_file is purely an Nvidia thing.

renari · April 24, 2025, 7:14am

❯ lsof -p $(pidof plasmashell)|grep sync_file|count
681

dombsky · May 8, 2025, 2:06pm

Just created an account to report here as well.

570.144 proprietary driver on 4070 non-super non-ti, if it matters - two displays with VRR enabled. Been having plasma crashes due to sync_file spam ever since I’ve switched to wayland. On Fedora 41 crashes were a bit more graceful with plasma just re-loading itself, but on F42 it became messy and tending more to freezes and outright crashes. Have to manually restart plasma when it inevitably happens.

Increasing the open file limit only delays the issue. No open bugs on KDE or Fedora side as far as I can tell.

amrits · June 19, 2025, 10:23am

We have a bug 5352012 filed locally for tracking purpose.
Will keep updated on the progress.

amrits · July 3, 2025, 5:01pm

We have local repro in house for further debugging.
But I just wanted to check if there was any previous passing driver for it.

Topic		Replies	Views
Plasmashell core dumping ~once per day with "too many open files" Linux	3	175	April 19, 2025
[LIKELY INVALID] Huge memory allocation on plasmashell with proprietary driver; nouveau is fine Linux	7	2417	September 22, 2016
Arch Linux, KDE Plasma, NVIDIA 375.20 - Artifacting, applications freezing with 100% CPU usage Linux	0	1661	November 29, 2016
KDE Plasma Wayland - Explicit sync driver bug Linux	21	3921	August 25, 2024
`gnome-shell`, `obs` running out of file descriptors, 1000s of `anon_inode:sync_file` Linux nvbugs	6	92	July 28, 2025
Vram is full. filled by Xorg and other porgrams Linux	7	4320	September 17, 2016
KDE Plasma crashing on X11 after 555.58 update Linux	2	1357	July 18, 2024
555 release feedback & discussion Linux	277	44510	February 3, 2025
Several essential KDE applications (sddm, krunner, plasmashell) segfault on startup with 361.16 Linux	51	20349	February 24, 2016
[KDE Plasma/Wayland/GBM] The contents of buffers become invalid after closing a window. [495-510] Linux	17	5417	May 31, 2023

Fd leak with explicit sync and kde plasma

Related topics