questions memory allocation and CUDA contexts


I have several questions related to memory allocation on device and CUDA contexts.

I have an application using two CPU treads. One of them is used to allocate several arrays on the GPU before the other requires accessing any of these arrays. I’m getting an invalid device pointer error. I’m guessing that it is related to the one-to-one correspondence between CPU treads and CUDA context.

  1. Is this correct?
  2. Is there a way to avoid this problem without changing the architecture of my solution?

If my application has several CPU threads executing the same CUDA code:

  1. Will it create automatically the same number of CUDA context than CPU threads?
  2. If this doesn’t happen, how can it be done? Should I use cuCtxCreate()? Is there an example of Context Management included in the CUDA SDK?

Thanks a lot,


  1. Yes

  2. Not familiar with your solution, but you have to make sure that the same CPU thread that uses GPU resources is also the thread that allocates/frees them.

  3. By default, in the run-time API, a CUDA context is associated with a CPU thread at the first CUDA call. So, it depends on how many of your CPU threads issue CUDA calls. But the maximum is one CUDA context per CPU thread at a time.

  4. I don’t think there’s a context management sample in the SDK. Context management is pretty straightforward, though. You can definately explicitly create and destroy CUDA contexts using the driver API. You can also disassociate a CUDA context from a CPU thread in the run-time API.


Hi Paulius,

Thanks for your answer. I would like to ask you something else:

Is it correct that only one CUDA contexts is executed at any time? In other words, is it impossible for two CPU tread to have access to the GPU at the same time; If the answer is “Yes”, How is it carry out?


If you mean “kernel execution” then yes, only one kernel can be executed on GPU at given time, if kernels invoked simultaneously they are serialized by driver.

Can you please clarify what kind of access you’re talking about?

For example:

  • I have two CPU threads, which are going to execute the same CUDA code. We know that each CPU thread will create a CUDA context (I’m using the runtime API). The question is: will both CPU threads have access to the GPU at the same time?


so, if the cuda code executed by each CPU thread is different, the CPU threads won’t have access to the GPU at the same time, is it correct? How is it carry out?

You have different threads, you have independent contexts. It doesn’t matter if the code you’re executing on theese contexts is different or not.