Anyway to tell Cuda run host function launched in different stream in multiple threads(such as a threadpool)

It seems all host function launched by cudaLaunchHostFunc within different stream is executed in single thread sequentially.
I don’t find any runtime API to configure CUDA to use a thread pool.
Anyway to do this?

1 Like

Hello @SparkHu , Any luck with this ?

a related thread: cudaLaunchHostFunc API example