Kernel Execution Sequence

baitNswitch · May 25, 2012, 7:16pm

What is the order of kernel execution?
The literature talks about asynchronous execution, but asynchronous in what respect?
It appears, it is the threads in the kernel that are executed asynchronously (unknown sequence).
The kernel execution themselves don’t appear to execute asynchronously, or do they?

I have 3 kernels, none contain any blocking/synchronous operations.
If I call kernelA, then kernelB, then a THRUST call, and then kernelC. How are they executed on the GPU?

Will kernelB ONLY start after kernelA is finished? (i.e. One after the other.)
Can kernelB start before kernelA has finished? (i.e. Can 2 kernels run at the same time.)
In which case, kernelC could complete execution before kernelA is complete.

Does the THRUST call appear as a single kernel?
Could a single THRUST call include more than one kernel call?
Do the THRUST calls include blocking between other THRUST or kernel calls?

Even though the kernels are started in a certain order, can they execute in a different order? i.e. Could kernelB start execution in the GPU before kernelA?

Thanks

seibert · May 25, 2012, 9:58pm

Kernels in the same CUDA stream (and if you don’t specify a stream, it is stream 0) will always run in the order you submit them. The asynchronous aspect of CUDA is that once you launch a kernel (or several), execution continues almost immediately on the CPU while the kernels run on the GPU in the background, until the CPU hits some kind of synchronization point, such as cudaDeviceSynchronize() or cudaMemcpy().

If you create multiple CUDA streams, then kernels in different streams can run in an arbitrary order relative to each other, or possibly even simultaneously.

I don’t use Thrust, so I can’t address those questions.

Topic		Replies	Views
kernel launches in the same stream CUDA Programming and Performance	4	5328	September 22, 2010
Streams and Kernel Execution Order CUDA Programming and Performance	2	1052	August 19, 2010
Very quick question regard aync CUDA Programming and Performance	4	2798	June 25, 2008
Overlapping execution / data transfer & kernel execution order CUDA Programming and Performance	2	745	December 10, 2015
Kernel execution CUDA Programming and Performance	2	947	September 28, 2009
Processing Order with Cuda Streams in 7.5 CUDA Programming and Performance	13	2203	June 24, 2016
Asynchronous kernel calls CUDA Programming and Performance	4	9321	October 21, 2009
Stream execution order in CUDA exercise Teaching & Curriculum Support	1	1281	February 3, 2020
Queries on Cuda threads sync CUDA Programming and Performance	3	920	March 21, 2016
Concurrent Kernel executions Concurrent Kernel executions on same CPU thread and multiple CPU threa CUDA Programming and Performance	2	4229	August 25, 2011

Kernel Execution Sequence

Related topics