vk_present_wait_sdl_repro.cpp.zip (7.7 KB)
Hi,
I have a SDL + Wayland + Vulkan reproducer for a VK_KHR_present_wait
CPU spin on NVIDIA.
This is related to the same general area as the known VK_KHR_present_wait
issues, but I do not think this is the X11/DRI3 fence race described by
nvglxfix. This reproducer uses SDL’s Wayland backend and a Vulkan Wayland
surface, not X11/DRI3/GLX.
Summary
Calling:
vkWaitForPresentKHR(device, swapchain, present_id, 1000000000);
from a helper thread causes NVIDIA’s Wayland Vulkan WSI path to busy-poll a
Wayland fd with a zero timeout.
The app passes a 1 second timeout to vkWaitForPresentKHR, but inside the
NVIDIA driver the thread repeatedly reaches:
ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
This burns one CPU thread at around 90-100%.
Setup I reproduced this with:
- GPU: NVIDIA GeForce RTX 4070 Laptop GPU
- Driver: 595.58.03
- Session: Wayland
- Compositor: Hyprland
- Vulkan loader: 1.4.341
- SDL: SDL2-compatible headers/libs backed by SDL 3 /
sdl2-compat - Reproducer uses SDL Wayland video driver internally
Reproducer
I attached vk_present_wait_sdl_repro.cpp.
It does the following:
- forces SDL’s Wayland video driver
- creates an SDL Vulkan window
- selects an NVIDIA Vulkan device
- creates a Vulkan swapchain
- enables:
VK_KHR_present_idVK_KHR_present_wait
- presents frames with monotonically increasing present IDs
- starts a dedicated
vk-presentwaitthread - that thread calls
vkWaitForPresentKHR(..., timeout=1000000000)
No arguments or environment variables are required.
Build
c++ -std=c++17 -O0 -g vk_present_wait_sdl_repro.cpp \
-o vk-present-wait-sdl-repro \
$(pkg-config --cflags --libs sdl2 vulkan) \
-pthread
Run
./vk-present-wait-sdl-repro
The program prints something like:
SDL video driver: wayland
selected device: vendor=0x10de device=0x2860 name='NVIDIA GeForce RTX 4070 Laptop GPU' driver='NVIDIA' info='595.58.03'
swapchain: extent=640x360 images=3 format=44 present_mode=fifo
pid=783451 main_tid=783451 present_wait=1
present-wait thread: tid=783480 timeout_ns=1000000000
Observed Behavior
Checking threads:
ps -L -p $PID -o pid,tid,comm,pcpu,stat,wchan:32
shows:
vk-presentwait 90-100% CPU
Example from my machine:
783451 783480 vk-presentwait 93.9 Rsl -
The repro’s own timing output shows vkWaitForPresentKHR returning
successfully, usually around every 8 ms on my 120 Hz display:
wait_last ~= 8ms
wait_avg ~= 8ms
timeouts=0
errors=0
So this is not an application-side infinite loop around vkWaitForPresentKHR;
the CPU burn is happening while inside the NVIDIA implementation.
strace Evidence
Attaching to the vk-presentwait thread:
sudo timeout 6s strace -tt -T -s 128 -e trace=ppoll -p $WAIT_TID
shows a tight zero-timeout poll loop:
ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout) <0.000006>
ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout) <0.000006>
ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout) <0.000006>
Why This Looks Like an NVIDIA Wayland WSI Bug
The application calls vkWaitForPresentKHR with a 1 second timeout:
timeout_ns=1000000000
but the NVIDIA implementation appears to repeatedly dispatch/poll the Wayland
queue with an immediate timeout instead of blocking until either:
- the relevant presentation feedback arrives, or
- the user-provided timeout expires.
This is the same symptom I originally saw in Gamescope’s SDL backend: one
gamescope-sdl thread burning CPU, with stacks through:
vkWaitForPresentKHR
libnvidia-glcore
wl_display_dispatch_queue_timeout
ppoll(... timeout={0,0})
The attached reproducer removes Gamescope from the equation and still
reproduces the same behavior. I originally reported it here, but it got 0 attention from anybody.
Expected Behavior
vkWaitForPresentKHR(..., timeout=1000000000) should not busy-spin a CPU
thread while waiting for Wayland presentation feedback.
It should block efficiently, or at least sleep/poll with a meaningful timeout
derived from the timeout passed to vkWaitForPresentKHR.
Actual Behavior
The NVIDIA Wayland WSI path repeatedly calls into Wayland polling with
{tv_sec=0, tv_nsec=0}, causing a dedicated thread to consume nearly a full
CPU core.
Thanks.