cuda (Newbie question) when using streams, does the order of the Async calls make a difference?

santj · December 5, 2010, 8:39pm

When using streams, will the overlap of memcpy and kernel execution be the same if the API calls are done serially vs all memcpyAsync(copyTo) calls first, then all kernel calls, then all memcpyAsync(copyfrom) calls last, as shown in the example code?

Example, serial calls:

for (i=0; i<nStreams; i++) {
offset = i*N/nstreams;
cudaMemcpyAsync(a_d+offset, a_h+offset, size, h_to_d, stream[i]);

kernel<<<N/(nThreads*nstreams), nThreads, 0, stream[i]>>>(a_d+offset);

cudaMemcpyAsync(a_h+offset, a_d+offset, size, d_to_h, stream[i]);
}

Versus, Example Overlapped calls:

for (i=0; i<nStreams; i++) {
offset = i*N/nstreams;
cudaMemcpyAsync(a_d+offset, a_h+offset, size, h_to_d, stream[i]);
}

for (i=0; i<nStreams; i++) {
offset = iN/nstreams;
kernel<<<N/(nThreadsnstreams), nThreads, 0, stream[i]>>>(a_d+offset);
}

for (i=0; i<nStreams; i++) {
offset = i*N/nstreams;
cudaMemcpyAsync(a_h+offset, a_d+offset, size, d_to_h, stream[i]);
}

Thanks for the help.

tmurray · December 5, 2010, 8:58pm

Yes. The rule for best performance is to basically launch breadth first over all your streams first before launching a second operation in any stream.

Topic		Replies	Views
Help with CUDA streams CUDA Programming and Performance	1	1599	April 2, 2010
async memcopy/kernel from different contexts overlaping operations from different contexts.. CUDA Programming and Performance	9	2945	December 18, 2008
Weird behaviour of CUDA streams CUDA Programming and Performance	0	1889	June 17, 2010
Syncronization with cuda Streams CUDA Programming and Performance cuda	8	416	October 12, 2021
About Stream control CUDA Programming and Performance	1	938	March 26, 2009
Asynchronous kernel execution and memory not overlapping using CUDA stream! CUDA Programming and Performance	3	864	July 7, 2017
Kernel Queueing CUDA Programming and Performance	8	9682	June 29, 2009
about streaming style sample code in Programming Guide ... why such a style? CUDA Programming and Performance	5	1419	January 23, 2009
CUDA and NPP Misc Issues CUDA Programming and Performance	6	1449	March 28, 2011
How to overlap execution of kernels in different streams with copy operations CUDA Programming and Performance	9	924	February 1, 2022

cuda (Newbie question) when using streams, does the order of the Async calls make a difference?

Related topics