cudaLaunchHostFunc API example

Robert_Crovella · March 18, 2023, 2:34pm

I wouldn’t expect that. The idea of “serialization” contradicts that idea.

There is no specification that I know of, nor any documentation, that provides details.

I will offer up a thought experiment. Suppose, hypothetically, that there was some “guarantee” provided of concurrent execution of host funcs. That would require 1 thread per callback/host func. Given that the number of host funcs launched can be arbitrarily large, how could any system provide that guarantee? It doesn’t make sense.

Furthermore, the CUDA developers have warned that attempts to do synchronization using cudaLaunchHostFunc is unwise and may lead to trouble. You can find this warning in several places. If you like, start with the documentation of cudaLaunchHostFunc already linked.

Based on all the information presented here, I conclude that attempting to use cudaLaunchHostFunc for an activity that has external dependencies is both unwise, and unintended by the CUDA designers. It may lead to trouble.

Given that CUDA provides other methods to declare dependencies between two streams, such as cudaStreamWaitEvent and cudaGraphs, and perhaps other methods, I would encourage people to consider those.

I won’t be able to answer further questions about undocumented characteristics of the handling of cudaLaunchHostFunc. I also don’t wish to argue the thought experiment. It’s OK if you disagree with any of my points. We don’t have to agree. The behavior is what it is, regardless of my opinion.

Anyone interested in seeing a change to either CUDA behavior or documentation is encouraged to file a bug.

Topic		Replies	Views
Does cudaLaunchHostFunc block work added to all streams? CUDA Programming and Performance	19	1530	October 12, 2021
cudaLaunchHostFunc compiler error with cmake 3.17.1 CUDA Programming and Performance	4	609	October 12, 2021
Why does cudaStreamAddCallback serialize kernel execution and break concurrency? CUDA Programming and Performance	12	8178	April 5, 2015
'Computations server' application design advice CUDA Programming and Performance	24	12774	March 23, 2007
cudaLaunchHostFunc blocking work on Linux CUDA Programming and Performance cuda	2	694	September 22, 2022
host streams CUDA Programming and Performance	11	1503	January 2, 2015
Signalling between cuda thread and host thread cuda-thread host-thread communication CUDA Programming and Performance	7	19660	June 21, 2011
Parallel execution of GPU and CPU functions using streams CUDA Programming and Performance	7	49446	January 20, 2011
culaunchHostFunc overhead latency usage + CPU->GPU signaling CUDA Programming and Performance	6	165	April 1, 2025
cuLaunchHostFunc Questions CUDA Programming and Performance cuda , kernel , nsight	2	626	March 1, 2024

cudaLaunchHostFunc API example

Related topics