Multiple host thread on a single GPU


1-On a “single GPU” system how do I call six different (or same) kernels, using six different host threads? (Assuming I have a device capable of executing multiple kernels concurrently)

2- I understand in CUDA 4.0 one of the features is Sharing GPUs across multiple threads. Is there any sample code demonstrating this feature?

Thanks and regards,


Hi Heshsham,

you can use cuCtxCreate to create a new context on each of the threads. I think you may need to use cuCtxSynchronize for synchronization purposes rather than cudaDeviceSynchronize(). These shouldbe the only requirements when using multiple threads.

However, if you want to use multiple threads just to launch multiple kernels at the same time, you can do this using cuda streams.

You can find the example for both cases in CUDA SDK

You will need to have a card (with compute capability >= 2.0) that is capable of doing this though.

Thanks for replying. Well regarding multiple kernels running concurrently I have another question:

Suppose I have a situation like this:







Kernel1<<< ,stream0>>>;

Kernel1<<< ,stream1>>>;

Kernel1<<< ,stream2>>>;

Kernel1<<< ,stream3>>>;


Assuming there is no data dependency among the kernels, what is the guarantee that the

kernels will run concurrently?

1 Like