cudaLaunchHostFunc blocking work on Linux

brian.budge · September 16, 2022, 8:58pm

I’m trying to do some runtime instrumentation, and have played a bit with events and cudaLaunchHostFunc. One issue with events is that if I run a bunch of very fast things, I seem to bump into the granularity of the event timers, so if I have 1000 things that take 400ns each, I get a report of essentially 0 ns, whereas I would hope for 400us.

Due to that, I’m trying the less performant cudaLaunchHostFunc, but I’m finding that it seems to synchronize streams, which is really not ideal (perhaps callbacks are made from a single CPU thread?). I found this post:

Which seems to indicate the hardware scheduling could fix this. I’m unfortunately unable to find how to turn on this feature in Linux. Suggestions?

Robert_Crovella · September 16, 2022, 9:21pm

hardware accelerated GPU scheduling pertains to windows only. As indicated in that thread already, the issue being reported in that thread was observed in windows, but when tested on linux, the issue did not manifest (as indicated in the 2nd post in that thread). There is no corresponding switch or control in linux.

brian.budge · September 22, 2022, 5:46pm

Interesting. I’m definitely observing serialization. Bummer that this can’t be controlled from Linux. Thanks for the response.

Topic		Replies	Views
Does cudaLaunchHostFunc block work added to all streams? CUDA Programming and Performance	19	1696	October 12, 2021
cudaLaunchHostFunc API example CUDA Programming and Performance	31	6893	February 8, 2025
culaunchHostFunc overhead latency usage + CPU->GPU signaling CUDA Programming and Performance	6	262	April 1, 2025
cudaLaunchHostFunc requires cudaStreamSynchronize CUDA Programming and Performance	2	397	January 28, 2024
cuLaunchHostFunc Questions CUDA Programming and Performance cuda , kernel , nsight	2	693	March 1, 2024
Question about cudaDeviceScheduleBlockingSync CUDA Programming and Performance	0	485	March 24, 2021
Anyway to tell Cuda run host function launched in different stream in multiple threads(such as a threadpool) CUDA Programming and Performance	2	708	July 11, 2023
Launching several kernels on one stream while another kernel running persistently in the background CUDA Programming and Performance	1	758	October 8, 2016
Is CUDA Synchronization function blocks whole host threads? CUDA Programming and Performance	0	343	October 7, 2020
Why Sleep blocking all cuda streams? CUDA Programming and Performance cuda	4	998	February 6, 2023

cudaLaunchHostFunc blocking work on Linux

Related topics