Thanks, It works.
__host__ cudaLaunchHostFunc
launch in CPU, why CPU not wait call back function? Or it is as same as creating a new thread.
cudaLaunchHostFunc
only schedules the callback. It does not wait for completion.
The callback function is executed in a cuda stream, just like a kernel call. If you need to wait for the callback to complete, you need to synchronize the stream (or device).
Sorry, I have not understood clearly.
Who runs MyCallback function, CPU or GPU?
The description of fn in cudaLaunchHostFunc is a host function call
. If CPU run MyCallback function, why CPU doesn’t wait this call back function? (first print two ret
value and then a
value in previous code). Do MyCallback and Main function run by two thread?
Or MyCallback runs by GPU, it make sense because CPU do not wait GPU.
The callback is executed on the CPU , and execution is stream-ordered. It is not specified how this implemented internally, but with a simple experiment you will see that the callback runs in a separate thread.
Thanks a lot.
I have added cudaLaunchHostFunc
after every cuda kernel to calculate running time of cuda kernel. From nsys, it seems cudaLaunchHostFunc will affect the time interval between two adjacent cuda kernel function calls. The following two pictures are same program without cudaLaunchHostFunc and with cudaLaunchHostFunc. From the SM active sparsity and total time of program, I found the running time of program with cudaLaunchHostFunc is larger than that without cudaLaunchHostFunc.
If you do kernel1 - host func1 - kernel2
in the same stream (for example the implicit default stream), kernel2 will not begin until host func1 is finished.
To measure the runtime of a kernel, one would typically use cudaEvents to record a timestamp before and after the kernel.
I do not add any Synchronize functions. So kernel2 will not wait host func1.
Operations in the same stream are executed sequentially. Kernel2 will implicitly wait for host func1.
Is this also true for an empty host function?
Yes. You are right. I confirmed this conclusion with an example.