vkWaitSemaphores pegs the thread while running

russell9 · February 20, 2025, 7:19am

We have a vulkan application that uses a timeline semaphore to handle ensuring resources are done being used by the GPU before we start using them for a subsequent frame, which requires calling vkWaitSemaphores early in a frame.

I was very surprised to see that vkWaitSemaphores does not relinquish the thread, and instead seems to run sched_yield in a tight loop, pegging the core for the entire vkWaitSemaphores call. This info was gathered by perfetto, which gives scheduler insights & uses ftrace to show which syscalls happen when.

For this simple test app, almost the entire frame time is taken up by vkWaitSemaphores, but the thread is actually scheduled to a CPU during the whole time (see the “Running” bar above it). Zooming in, all the tiny blue bars are sched_yield syscalls, about 1/microsecond:

This is very surprising to me; waiting on a semaphore should not consume much CPU time, just wall time.

I’ve recorded a vulkan API capture with gfxreconstruct v0.9.11:

gfxrecon.zip (186.2 KB)

Which can be replayed with gfxrecon-replay. It seems to have the same behavior (high CPU usage with spurts of 1MHz sched_yield calls. We’re happy to attach a test application privately if it’s helpful.

I’ve tested the same app on a discrete nvidia GPU (linux, RTX 3050 Ti Mobile), which does not have the same issue (in fact, it doesn’t even spend any significant amount of wall time in vkWaitSemaphores, vsync wait time ends up in vkQueuePresent), as well on intel+linux.

I can’t think of something we’re doing wrong, and considering it works fine on other (even other nvidia) platforms, it seems possible it’s a driver bug.

L4T 36.4.3, Jetson Orin NX 8GB on Jetson Orin Nano devkit board, gnome desktop (X11).

Thanks for the help!

DaneLLL · February 20, 2025, 10:41am

Hi,
Thanks for the information. The low-level software stacks are separate for dGPU and Jetson platforms. It is possible certain functions work differently. We will check it while unifying the software stack in future Jetpack release.

russell9 · February 20, 2025, 8:59pm

I see, is that coming soon to jetpack?

DaneLLL · February 20, 2025, 11:08pm

Hi,

It may take some time and not be present in near future.

russell9 · February 27, 2025, 11:26pm

I see–so is the current driver stack being maintained? Is there any hope of this issue being fixed?

Topic		Replies	Views
Linux Vulkan driver issue with timeline semaphors Linux vulkan , linux , linux-driver	2	556	June 3, 2024
TensorFlow GPU runtime worse than CPU - TX2 Jetson TX2	14	4197	October 18, 2021
Perfomances drop after AGX Orin update Jetson AGX Orin cudnn	7	89	March 28, 2025
jetson tx2 not using gpu for my the opencv caffe-model Jetson TX2	7	938	October 18, 2019
Jetson Orin NX NvTransform performance Jetson Orin NX mmapi	16	53	May 22, 2025
cudaWaitExternalSemaphoresAsync blocks CPU Kernel launch CUDA Programming and Performance	3	46	April 3, 2025
Xavier NX hardware JPEG encoder extremely slow Jetson Xavier NX encoder	11	1091	February 21, 2023
Jetson orin nx(16G) installed jetpack 6.0 can't use jtop Jetson Orin NX tools	6	99	November 6, 2024
Deepstream 7 not using GPU on jetson Orin NX 16 DeepStream SDK jetpack , jetson , deepstream , ainvr , jetson-orin	3	36	December 30, 2024
Extremely slow CUDA API calls? Jetson TX1	6	2891	October 18, 2021

vkWaitSemaphores pegs the thread while running

Related topics