Async start kernel in different stream after another completes?

csp256 · April 4, 2016, 10:17pm

I am currently async launching kernel A, then several grids of kernel B (each with different number of blocks), then doing a large amount of CPU work in parallel, and then transferring the results back.

Each grid of kernel B is independent of each other, but dependent upon kernel A. However, I am not currently using streams. I know I can wait until kernel A completes, then explicitly give each kernel B its own stream.

However, if I am understanding this correctly, this requires me to wait until after kernel A completes and then launch the B kernels, because if I do not they will erroneously begin executing in parallel with kernel A. Yet, I do not want to interrupt my CPU work to do this.

How do I asynchronously queue kernel launches in different streams to begin after the completion of a single kernel?

Robert_Crovella · April 4, 2016, 10:33pm

One possible method would be to use cudaStreamWaitEvent

Launch kernelA
Issue cudaEventRecord(eventA)

then in each of your dependent streams,

cudaStreamWaitEvent(eventA)
kernelB<<<…>>>(my_chunk)

[url]http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html#group__CUDART__STREAM_1gc301fd024e6fd4a17074d229d4504077[/url]

csp256 · April 4, 2016, 10:38pm

That looks like what I need! Thanks txbob!

Topic		Replies	Views
kernel launches in the same stream CUDA Programming and Performance	4	5232	September 22, 2010
Waiting for particular kernel CUDA Programming and Performance	1	2722	September 11, 2007
Scheduling kernel dependencies CUDA Programming and Performance	2	1389	July 1, 2018
Question about CUDA streams CUDA Programming and Performance	8	735	November 8, 2019
concurrent kernel execution using stream CUDA Programming and Performance	1	561	March 22, 2016
Can a kernel be switched like a thread in OS? CUDA Programming and Performance	1	248	September 8, 2023
Running several streams asynchronously CUDA Programming and Performance	3	528	November 24, 2018
cuda kernels execution one by one - in sequential CUDA Programming and Performance	2	3423	January 27, 2012
Synchronization between streams CUDA Programming and Performance	1	581	December 13, 2017
Overlap cudaMemcpyAsync and kernel CUDA Programming and Performance	1	505	February 10, 2021

Async start kernel in different stream after another completes?

Related topics