We have an application that needs to process many images in real-time. However, the images are not huge so it is not likely that we will saturate all streaming multi processors for a single image. Can multiple kernels runs in parallel on a single GPU? We are targeting GTX 1080ti. Is there a technical name for running parallel kernels, as I also want to know if OpenCL supports this.
Related to the above. Each image will be run through a chain of image processing algorithms, one after the other. I have noticed that dispatching a kernel has some overhead on Windows. Is there a way for GPU to dispatch a new kernel automatically when one finishes and change buffer/image bindings? Or is there some command list recording type feature? If CUDA supports this, I also would like to know if OpenCL does if someone knows.