I have a program that uses multiple CPU threads to launch kernels, and the main thread tries to sync with cudaDeviceSynchronize(), which doesn’t seem to be successful.
Thus I guess the CPU threads each has its own CUDA context? If that’s the case, how do I instruct the CPU threads to share the same CUDA context?
I program using runtime API, compile with “–default-stream=per-thread”, and use CUDA 10.2 on Windows.
Thanks for the help!