NVIDIA Wayland Vulkan WSI busy-spins in vkWaitForPresentKHR with VK_KHR_present_wait

vk_present_wait_sdl_repro.cpp.zip (7.7 KB)

Hi,

I have a SDL + Wayland + Vulkan reproducer for a VK_KHR_present_wait
CPU spin on NVIDIA.

This is related to the same general area as the known VK_KHR_present_wait
issues, but I do not think this is the X11/DRI3 fence race described by
nvglxfix. This reproducer uses SDL’s Wayland backend and a Vulkan Wayland
surface, not X11/DRI3/GLX.

Summary

Calling:

vkWaitForPresentKHR(device, swapchain, present_id, 1000000000);

from a helper thread causes NVIDIA’s Wayland Vulkan WSI path to busy-poll a
Wayland fd with a zero timeout.

The app passes a 1 second timeout to vkWaitForPresentKHR, but inside the
NVIDIA driver the thread repeatedly reaches:

ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)

This burns one CPU thread at around 90-100%.

Setup I reproduced this with:

  • GPU: NVIDIA GeForce RTX 4070 Laptop GPU
  • Driver: 595.58.03
  • Session: Wayland
  • Compositor: Hyprland
  • Vulkan loader: 1.4.341
  • SDL: SDL2-compatible headers/libs backed by SDL 3 / sdl2-compat
  • Reproducer uses SDL Wayland video driver internally

Reproducer

I attached vk_present_wait_sdl_repro.cpp.

It does the following:

  • forces SDL’s Wayland video driver
  • creates an SDL Vulkan window
  • selects an NVIDIA Vulkan device
  • creates a Vulkan swapchain
  • enables:
    • VK_KHR_present_id
    • VK_KHR_present_wait
  • presents frames with monotonically increasing present IDs
  • starts a dedicated vk-presentwait thread
  • that thread calls vkWaitForPresentKHR(..., timeout=1000000000)

No arguments or environment variables are required.

Build

c++ -std=c++17 -O0 -g vk_present_wait_sdl_repro.cpp \
  -o vk-present-wait-sdl-repro \
  $(pkg-config --cflags --libs sdl2 vulkan) \
  -pthread

Run

./vk-present-wait-sdl-repro

The program prints something like:

SDL video driver: wayland
selected device: vendor=0x10de device=0x2860 name='NVIDIA GeForce RTX 4070 Laptop GPU' driver='NVIDIA' info='595.58.03'
swapchain: extent=640x360 images=3 format=44 present_mode=fifo
pid=783451 main_tid=783451 present_wait=1
present-wait thread: tid=783480 timeout_ns=1000000000

Observed Behavior

Checking threads:

ps -L -p $PID -o pid,tid,comm,pcpu,stat,wchan:32

shows:

vk-presentwait  90-100% CPU

Example from my machine:

783451  783480 vk-presentwait  93.9 Rsl  -

The repro’s own timing output shows vkWaitForPresentKHR returning
successfully, usually around every 8 ms on my 120 Hz display:

wait_last ~= 8ms
wait_avg  ~= 8ms
timeouts=0
errors=0

So this is not an application-side infinite loop around vkWaitForPresentKHR;
the CPU burn is happening while inside the NVIDIA implementation.

strace Evidence

Attaching to the vk-presentwait thread:

sudo timeout 6s strace -tt -T -s 128 -e trace=ppoll -p $WAIT_TID

shows a tight zero-timeout poll loop:

ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout) <0.000006>
ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout) <0.000006>
ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout) <0.000006>

Why This Looks Like an NVIDIA Wayland WSI Bug

The application calls vkWaitForPresentKHR with a 1 second timeout:

timeout_ns=1000000000

but the NVIDIA implementation appears to repeatedly dispatch/poll the Wayland
queue with an immediate timeout instead of blocking until either:

  • the relevant presentation feedback arrives, or
  • the user-provided timeout expires.

This is the same symptom I originally saw in Gamescope’s SDL backend: one
gamescope-sdl thread burning CPU, with stacks through:

vkWaitForPresentKHR
libnvidia-glcore
wl_display_dispatch_queue_timeout
ppoll(... timeout={0,0})

The attached reproducer removes Gamescope from the equation and still
reproduces the same behavior. I originally reported it here, but it got 0 attention from anybody.

Expected Behavior

vkWaitForPresentKHR(..., timeout=1000000000) should not busy-spin a CPU
thread while waiting for Wayland presentation feedback.

It should block efficiently, or at least sleep/poll with a meaningful timeout
derived from the timeout passed to vkWaitForPresentKHR.

Actual Behavior

The NVIDIA Wayland WSI path repeatedly calls into Wayland polling with
{tv_sec=0, tv_nsec=0}, causing a dedicated thread to consume nearly a full
CPU core.

Thanks.

Bump, still reproducible on 595.71.05.

Bump, what else do I need to do to get the devs attention?
You’ve got all the repro steps spoonfed. I just retried with NVK/nouveau (mesa-git) and radeon-vulkan (AMD APU), and this issue ONLY occurs on any NVIDIA driver (nvidia-open/nvidia)