Referencing device memory from multiple threads

YB01 · August 18, 2010, 6:51pm

I’ve seen a couple of questions related to this topic in a couple of different places, but nothing ever had a firm answer.

Here’s what I’m trying to do:

Allocate device memory x using cudaMalloc in thread A.
Access device memory x (e.g. to zero) from thread A.
Thread A creates thread B (using CreateThread; this is Windows)
Thread A blocks waiting for thread B to complete (using WaitForMultipleObjects)
Access device memory x (e.g. for cudaMemcpy, cufft, etc) in thread B

Step 5 always fails with cudaErrorInvalidValue.

The memory pointer value is still correct, and the calling thread hasn’t yet exited (which would kill the context), so why can’t I access device memory in a thread other than the one that created it?

Thanks!

avidday · August 18, 2010, 7:04pm

The context is valid only in the thread that created it. You cannot arbitrarily share a context amongst threads. There is a context migration API to transfer context from thread to thread, but there is non-trivial overhead in the operation. You might want to think about a different multithreading model, perhaps having a single thread holding the context and acting as a consumer, and then multiple producer threads can feed it work asynchronously.

avidday · August 18, 2010, 7:04pm

The context is valid only in the thread that created it. You cannot arbitrarily share a context amongst threads. There is a context migration API to transfer context from thread to thread, but there is non-trivial overhead in the operation. You might want to think about a different multithreading model, perhaps having a single thread holding the context and acting as a consumer, and then multiple producer threads can feed it work asynchronously.