When multiple CPU threads launch their own kernels, do they share the same CUDA context?

Hi,
I have a program that uses multiple CPU threads to launch kernels, and the main thread tries to sync with cudaDeviceSynchronize(), which doesn’t seem to be successful.

Thus I guess the CPU threads each has its own CUDA context? If that’s the case, how do I instruct the CPU threads to share the same CUDA context?

I program using runtime API, compile with “–default-stream=per-thread”, and use CUDA 10.2 on Windows.

Thanks for the help!

Anybody?

See cudaDeviceSynchronize() doesn't wait for kernels launched by other CPU threads, why?