Bound multiple host threads to the same context?

nfrontiere · September 26, 2017, 3:34am

My main question is in the title, and the details are below:

I have an application where we need to compute interaction lists.
For the CPU version we have openMP threads in parallel assemble their own interaction lists (which come from walking a tree), and then each calls the interaction kernel for their own data.

For our GPU version, I want the openMP threads to run again in parallel to assemble the interaction lists, but instead of computing them, each would asynchronously copy them to the GPU execute a kernel to evaluate the list, and continue to assemble their next list.

Now, although I can assume that I will run only one device per MPI task (meaning all of my threads will use the same device), It would be hard to designate one host thread to do all of the GPU tasks/calls as I want all of the openMP threads to continuously collect/execute/return each interaction list in parallel so I want them to all make kernel calls individually.

Now, I dont want to create a bunch of contexts (naively one per thread) since I am using only one device and dont want the large penalty, so I thought the most natural way to run would be to create one context, where say thread 0 initially allocated large arrays (of say length SIZE), and when it runs each host thread could copy/use/return their data on a designated portion of that array (of length SIZE/numCPUthreads).

So essentially I want to bound the thread 0 context to all of my host threads, so they could access the same device memory pointers (and I am handling all of the collisional logic myself by only using the designated portion of those arrays for each thread). And in fact I am creating the same number of streams as I have threads, to assign them their own stream in the context in hopes of making all of the calls in parallel.

I thought the following code would work:

//Find which device I want and store device ID in m_devID
…

cuCtxCreate(&cuContext, 0, m_devID);

//Malloc all of my arrays with primary thread
//Create numCPUThreads number of streams
…

#pragma omp parallel for
for (over all interaction lists I need to build){
cuCtxSetCurrent(cuContext);
// cuda code where I host copy data to this threads designated part of the allocated arrays, run kernels, using their designated stream for this thread.
…

}

What I found is if I use device 0 everything works (even without specifying any contexts!) and I think thats just a coincidence since the default is device 0 for everything so for some reason all my threads can see the context no problem. When I tell it to use another one of the devices on a node (my nodes have 4 gpus), it crashes with ‘invalid resource handle’ errors on some of my kernels, and I assume it has to do with context issues.

Iv tried many things. Calling cudaSetDevice in the for loop, calling cuCtxPushCurrent, etc.

So my question is simple. Is there an easy way to just make ONE context on ONE device that is shared by ALL host threads? Do I need one context per thread? Is what I am attempting to do reasonable?

Robert_Crovella · September 26, 2017, 4:01am

Yes, you can use a single context (per device) across multiple host threads. It’s reasonable. In fact, it is what the CUDA runtime API does, by default.

nfrontiere · September 26, 2017, 4:07am

Thanks for the reply txbob,

Then what is the correct syntax? If it is true by default one would think just calling cudaSetDevice in the pragma for loop for all of the threads (to the same device), it would work right? I assumed each thread made its own context by default but then again it working for device 0 does appear to mean that by default the threads use the same context on device 0.

Do I explicitly set the context with cuCtxSetCurrent, as I tried above? Is there another way to bind all the host threads to the same context explicitly?

Robert_Crovella · September 26, 2017, 2:03pm

Use the runtime API.

Problem solved.

Topic		Replies	Views
How to share GPU memory from different host threads? CUDA Programming and Performance	6	2353	July 14, 2010
CUDA + CPU threads CUDA Programming and Performance	5	11695	August 20, 2008
How to Avoid Re-Creating CUDA Context with Multi-GPU using OpenMP? Call a function that does omp par CUDA Programming and Performance	2	1837	July 28, 2009
contexts + host threads CUDA Programming and Performance	2	1720	November 10, 2008
Using Multiple Host Threads with Unique Contexts tied to Devices OptiX	4	1624	June 14, 2022
questions memory allocation and CUDA contexts CUDA Programming and Performance	7	11319	February 4, 2008
Managing multiple GPUs from a single host thread CUDA Programming and Performance	1	1222	October 10, 2010
is it possible that multiple device contexts of cuda devices on single host thread CUDA Programming and Performance	1	713	December 3, 2010
Contexts and cudaMallocHost Same rules? CUDA Programming and Performance	17	11275	November 15, 2008
Multi-GPU parallel context creation using OpenMP CUDA Programming and Performance	2	1274	November 1, 2017

Bound multiple host threads to the same context?

Related topics