Launching Kernels in simultaneously on two GPUs

If I have two independent Kernels how is it possible to launch one, then another on two independent GPUs?

Can I just create two independent threads and push the kernel onto device 0 then device 1?

Anyone know if a “cuda stream” can satisfy this purpose?

At the moment, streams are only for controlling asynchronous operations on within a single GPU context. If you are using two GPUS, you currently need to use two threads because GPU contexts require their own thread.

I am trying to do something like that.
Two CPU threads, setting different devices on each thread and calling the kernel.

But in execution, only one of the kernels is executed, one time is device 0 and another time device 1.

My program is based on cudaOpenMP but I cannot find what’s wrong.

Do I need a compiler flag or something?

It is probably context, thread and GPU affinity issues, which are very hard to manage correctly with CUDA and OpenMP as things stand today. You might want to consider using something different for threading (say boost or a native thread library).

EDIT: Of course their also is the possibility that both contexts are winding up on the same GPU. How are you assigning GPUs in the code?

I started from the beginning with openmp again cause that’s the project, it works up to this point.
The assigning is exactly the same, based on omp_get_thread_num, and I set 2 omp threads, as many as the GPUs.

I really haven’t managed to find the problem.

But I can say that things are very fragile. I had to compile/execute after every line of code, a little mistake could have very weird results


I’m using Windows 7 so I’ll post my findings when I’m done coding. I have used pthreads on linux, but I’m new to threading on windows.

Give me a few days.

I’m aiming to launch launching two kernels on two separate devices. Make sure that you launching kernels to the same device form different threads.

maybe this helps: