We have a vulkan application that uses a timeline semaphore to handle ensuring resources are done being used by the GPU before we start using them for a subsequent frame, which requires calling vkWaitSemaphores
early in a frame.
I was very surprised to see that vkWaitSemaphores
does not relinquish the thread, and instead seems to run sched_yield
in a tight loop, pegging the core for the entire vkWaitSemaphores
call. This info was gathered by perfetto, which gives scheduler insights & uses ftrace to show which syscalls happen when.
For this simple test app, almost the entire frame time is taken up by vkWaitSemaphores
, but the thread is actually scheduled to a CPU during the whole time (see the “Running” bar above it). Zooming in, all the tiny blue bars are sched_yield
syscalls, about 1/microsecond:
This is very surprising to me; waiting on a semaphore should not consume much CPU time, just wall time.
I’ve recorded a vulkan API capture with gfxreconstruct v0.9.11:
gfxrecon.zip (186.2 KB)
Which can be replayed with gfxrecon-replay
. It seems to have the same behavior (high CPU usage with spurts of 1MHz sched_yield
calls. We’re happy to attach a test application privately if it’s helpful.
I’ve tested the same app on a discrete nvidia GPU (linux, RTX 3050 Ti Mobile), which does not have the same issue (in fact, it doesn’t even spend any significant amount of wall time in vkWaitSemaphores
, vsync wait time ends up in vkQueuePresent
), as well on intel+linux.
I can’t think of something we’re doing wrong, and considering it works fine on other (even other nvidia) platforms, it seems possible it’s a driver bug.
L4T 36.4.3, Jetson Orin NX 8GB on Jetson Orin Nano devkit board, gnome desktop (X11).
Thanks for the help!