GPU dispatch and parallel kernels

Two questions:

  1. We have an application that needs to process many images in real-time. However, the images are not huge so it is not likely that we will saturate all streaming multi processors for a single image. Can multiple kernels runs in parallel on a single GPU? We are targeting GTX 1080ti. Is there a technical name for running parallel kernels, as I also want to know if OpenCL supports this.

  2. Related to the above. Each image will be run through a chain of image processing algorithms, one after the other. I have noticed that dispatching a kernel has some overhead on Windows. Is there a way for GPU to dispatch a new kernel automatically when one finishes and change buffer/image bindings? Or is there some command list recording type feature? If CUDA supports this, I also would like to know if OpenCL does if someone knows.


Yes, in CUDA this is called concurrent kernels. There is a concurrency section in the programming guide and a concurrent kernels sample code. I’m reasonably sure OpenCL supports it also.

Yes, this can be done in both CUDA and OpenCL. Kernels can be launched in rapid succession, and you will pay the launch overhead on the first kernel launch, but subsequent kernel launch latencies can be masked by previous kernel launches if they are still executing. A kernel launch is asynchronous, allowing the CPU to continue with dispatch work while previous kernels are executing. Something similar happens in OpenCL when you “enqueue” kernels.