multi-threaded kernel concurrent execution on a single GPU

mianlu · May 4, 2010, 4:23am

Hi All:

I have read the multi-threaded programming on multi-GPU. I wonder whether it is possible to use a similar style to program on a single GPU. Specifically, I first allocate a device memory space, then assign this address to two different threads. These two different threads start their own kernel and do some computation on the same device memory space independently. However, I found I could not get the correct results. It looks like two threads cannot share this device memory pointer and access it correctly. I have tested this on the GPU with compute capability 1.3. Is it because it does not support concurrent kernel executions? Or some people have more suggestions?

Thanks.
Mian

ONeill · May 4, 2010, 9:43am

Pointers to device memory are only valid within the context where they were created. If you use a pointer to device memory in a thread other than the creating one without using portable memory you will get awkward “results”.
Concurrent kernel execution is only supported in CUDA 3.0 with hardware of compute capability 2.0 (Fermi).

348230684 · January 14, 2021, 2:47pm

Hi Mian,

I am suffering the same problem, in order to acccelate the inference time, I expect to use one gpu card to predict multi images simultaneously? in other words, predict multi images in parallel.

Could you give me some suggestion? many thanks.

Kind Regards

Wei

Robert_Crovella · January 14, 2021, 2:49pm

use triton inference server