multi-threaded kernel concurrent execution on a single GPU

Hi All:

I have read the multi-threaded programming on multi-GPU. I wonder whether it is possible to use a similar style to program on a single GPU. Specifically, I first allocate a device memory space, then assign this address to two different threads. These two different threads start their own kernel and do some computation on the same device memory space independently. However, I found I could not get the correct results. It looks like two threads cannot share this device memory pointer and access it correctly. I have tested this on the GPU with compute capability 1.3. Is it because it does not support concurrent kernel executions? Or some people have more suggestions?


Pointers to device memory are only valid within the context where they were created. If you use a pointer to device memory in a thread other than the creating one without using portable memory you will get awkward “results”.
Concurrent kernel execution is only supported in CUDA 3.0 with hardware of compute capability 2.0 (Fermi).

Hi Mian,

I am suffering the same problem, in order to acccelate the inference time, I expect to use one gpu card to predict multi images simultaneously? in other words, predict multi images in parallel.

Could you give me some suggestion? many thanks.

Kind Regards


use triton inference server