Running several streams asynchronously

veredz72 · November 24, 2018, 5:46am

Hello,

I have N matrices that should be processed.
The results in each matrix does not depend on the results from the other N-1 matrices.
They are all independent.
In order to work with streams I want to create N streams and then run the processing in each of them.

cudaStreamCreate ( &stream1) ;
kernel_1 <<< grid, block, 0, stream1 >>> ( …, dev2, … ) ;
…
kernel_Last <<< grid, block, 0, stream1 >>> ( …, dev3, … ) ;

Each kernel in a stream works on the same memory space as its previous one.
How can I make sure that kernel_2 finished running before calling kernel_3 ?

Thank you,
Zvika

Robert_Crovella · November 24, 2018, 8:55am

If the work is independent, there should be no need to ensure that one is finished before executing the next. Nevertheless, to answer your question, to make sure that kernel_2 is finished before running kernel_3, you could do exactly what you have shown, i.e. launch kernel_2 into a particular stream, and then launch kernel_3 into the same stream.

Stream semantics are really simple:

Items launched into the same stream will have their execution serialized, in launch order.
Items launched into separate streams have no defined ordering relationship enforced by CUDA streams.

veredz72 · November 24, 2018, 7:26pm

Hi Robert,

Thank you very much for the fast reply.
My code looks like the following:

for (int i=0;i<N;i++)
{
    err = cudaStreamCreate (&stream[i]);
}

for (int i=0;i<N;i++)
{
    kernel_1 <<< grid, block, 0, stream[i] >>> ( …, dev2, … ) ;
    .....
    kernel_Last <<< grid, block, 0, stream[i] >>> ( …, dev3, … ) ;
}

for (int i=0;i<N;i++)
{
    err = cudaStreamSynchronize (stream[i]);
}

Is this the right and fastest way to make sure that all streams finished ?

Best regards,
Zvika

Robert_Crovella · November 24, 2018, 11:11pm

The middle for-loop looks strange to me. Maybe it is correct/what you intend. I don’t know.

The final for-loop should synchronize all streams. Or you could just do a cudaDeviceSynchronize there.

Topic		Replies	Views
Async start kernel in different stream after another completes? CUDA Programming and Performance	2	664	April 4, 2016
Stream execution order in CUDA exercise Teaching & Curriculum Support	1	1273	February 3, 2020
Question about CUDA streams CUDA Programming and Performance	8	850	November 8, 2019
error results from Stream in "for" loop CUDA Programming and Performance	11	81248	January 29, 2016
cudaStreamSynchronize(a_stream) simpleStreams CUDA Programming and Performance	2	2436	December 2, 2010
CUDA stream management CUDA Programming and Performance	1	497	December 15, 2016
I want to synchronize CUDA streams CUDA Programming and Performance	5	1002	January 5, 2024
kernel launches in the same stream CUDA Programming and Performance	4	5320	September 22, 2010
Synchronization between streams CUDA Programming and Performance	1	606	December 13, 2017
Syncronization with cuda Streams CUDA Programming and Performance cuda	8	518	October 12, 2021

Running several streams asynchronously

Related topics