Is it possible to schedule multiple kernels on a GPU at any point in time?

CudaDevOps · February 28, 2023, 6:24pm

I was wondering how kernels are actually scheduled and if it were possible for a GPU to have multiple kernels running on it at the same time.
Also when in the case of dynamic parallelism(threads individually calling kernels), how are they scheduled, and would it not blow up? (like if a kernel with 1024 threads called a kernel in its code).
Thanks.

Robert_Crovella · February 28, 2023, 8:54pm

Yes, its possible. Keeping things fairly simple, a GPU kernel will get scheduled (block by block) when there are sufficient resources for it at the SM level, and assuming the block scheduler doesn’t have other work that it chooses to do first. Nothing in that statement precludes blocks from different kernels being scheduled on the same GPU, even on the same SM.

Beyond that, there are numerous writeups of various questions and aspects of GPU block scheduling on various forums. Here is an example Furthermore, CUDA provides a concurrentKernels sample code that allows basic inspection.

Regarding what happens in CDP, to a first order approximation, launching 1024 kernels one from each of 1024 threads on the device is no different than attempting to launch 1024 kernels (perhaps one from each of 1024 threads, if you wish) on the host. The GPU maintains work queues, and kernels go into queues for processing.

There is a difference. On the host side, when the queues are “full”, the kernel launch process switches from asynchronous to synchronous (each launch effectively waiting for a queue slot). So the process becomes self-throttling. AFAIK there is no equivalent “self-throttling process” on the device side. It is up to the device code programmer to make sure they don’t exceed the available queue depth, and CUDA provides both an indication of this as well as an explicit error type. For additional background, I suggest reading the section on CDP in the programming guide. You may specifically wish to pay attention to pending launch limits

Topic		Replies	Views
Multiple Kernels CUDA Programming and Performance	3	2756	March 6, 2008
concurrent kernels call on diffrent cpu threads CUDA Programming and Performance	4	3042	July 21, 2009
A question the parallelization CUDA Programming and Performance	1	1184	July 28, 2008
Deep dive in concurrent kernel launches CUDA Programming and Performance	3	1263	February 3, 2019
GPU dispatch and parallel kernels CUDA Programming and Performance	1	2428	July 1, 2017
Threaded CUDA Multiple concurrent kernels? CUDA Programming and Performance	9	5592	October 20, 2009
Multiple thread/process access to single GPU CUDA Programming and Performance	5	5959	May 13, 2008
Concurrent execution of kernels on the same SM CUDA Programming and Performance	1	519	October 28, 2021
Concurrent Kernel Execution CUDA Programming and Performance	2	4527	June 10, 2011
Multiple kernels in flight? CUDA Programming and Performance	19	26819	August 28, 2007

Is it possible to schedule multiple kernels on a GPU at any point in time?

Related topics