cudaLaunchHostFunc API example

There is no cudaLaunchHostFunc example on Google.

Does anybody know the cudaLaunchHostFunc ?

host ​cudaError_t cudaLaunchHostFunc ( cudaStream_t stream, cudaHostFn_t fn, void* userData )’

In that API, I don’t understand ‘cudaHostFn_t fn’ and ‘void* userData’.

fn is host function name. but where is the argument area…? Is argument area userData?

The function is described here:

it is effectively what is underpinning the CUDA stream callback functionality. Therefore rather than try to use this launch mechanism directly, I would recommend that you use stream callbacks as defined in the programming guide:

api manual:

and also there is a cuda sample code that gives an example:

Thank you so much Robert.

I will check the API!

Hello, Mr. Robert.

I read the API that your link.

but I am confuing the parameters.

host ​cudaError_t cudaStreamAddCallback ( cudaStream_t stream, cudaStreamCallback_t callback, void* userData, unsigned int  flags )

In that API, what datatype are ‘cudaStreamCallback_t callback’ and ‘void* userData’?

That means, is ‘cudaStreamCallback_t callback’ a host function?
is ‘void* userData’ a host function’s parameter?

if ‘void* userData’ is a host function’s parameter, how to give the parameters?

for example, myCallback(int a, float b, …)

Thank you so much Mr. Robert!

why not look at the sample code I pointed you to?

Sorry… I forgot looking the sample code.

But something difficult to me.

there are no sending multiple parameters.

Thank you much Mr. Robert.

pass a pointer to struct:

Oh! userData could be a pointer of structure! Thank you!

Just a question about cudaLaunchHostFunc that your links didn’t answer. It’s quite clear that the CUDA runtime is using its own CPU thread in order to execute the Host function. But, using NVTX and tagging the inside of the Host function, I can see how the executions (CPU) are serialized even if they are enqueued in different CUDA streams:

  • Are Host functions enqueued on different CUDA streams, asynchronous with respect to each other?
  • Does the CUDA runtime have a CPU threadpool to do so? Or it uses something like std::async?
  • If it does have it’s own threadpool, is this threadpool always initialized? Or only when you call cudaAddCallback and cudaLaunchHostFunc??
  • How many threads does the threadpool spawn and depending on what?