Referencing device memory from multiple threads

I’ve seen a couple of questions related to this topic in a couple of different places, but nothing ever had a firm answer.

Here’s what I’m trying to do:

  1. Allocate device memory x using cudaMalloc in thread A.
  2. Access device memory x (e.g. to zero) from thread A.
  3. Thread A creates thread B (using CreateThread; this is Windows)
  4. Thread A blocks waiting for thread B to complete (using WaitForMultipleObjects)
  5. Access device memory x (e.g. for cudaMemcpy, cufft, etc) in thread B

Step 5 always fails with cudaErrorInvalidValue.

The memory pointer value is still correct, and the calling thread hasn’t yet exited (which would kill the context), so why can’t I access device memory in a thread other than the one that created it?

Thanks!

The context is valid only in the thread that created it. You cannot arbitrarily share a context amongst threads. There is a context migration API to transfer context from thread to thread, but there is non-trivial overhead in the operation. You might want to think about a different multithreading model, perhaps having a single thread holding the context and acting as a consumer, and then multiple producer threads can feed it work asynchronously.

The context is valid only in the thread that created it. You cannot arbitrarily share a context amongst threads. There is a context migration API to transfer context from thread to thread, but there is non-trivial overhead in the operation. You might want to think about a different multithreading model, perhaps having a single thread holding the context and acting as a consumer, and then multiple producer threads can feed it work asynchronously.