How to use streams for asynch transfers

chrismc · February 12, 2011, 11:06am

I have a kernel that integrates some variables and updates others, and they all need to be transferred from device to host asap.

So that they can be transferred asynchronously can I simply call cudamemcpyasynch with a default stream ID of zero, or do I need to create a unique stream for each variable and then call cudamemcpyasynch with each call referring to a unique stream ID?

tera · February 12, 2011, 11:27am

Depends on what is supposed to run concurrently. If you want the copy to run in parallel to CPU code, cudaMemcpyAsync() is enough. But if you want the copy to execute parallel to a kernel on the GPU, you need to place them in different streams.

chrismc · February 12, 2011, 11:43am

The situation is this. Several arrays of data size M are divided over N GPUs. The same kernel executes on the arrays of size M/N, then five variables in arrays of size M/N are copied from D2H, then five MPI_ALLgathers are called so that each process has the same copy of the five variables in arrays of size M, then those 5 arrays of size M are copied from H2D.

The CUDA SDK simpleStreams project uses streams to divide the data over the kernel, and I can understand why this could be faster. But how could the second mempcy from H2D be done asynch?

tera · February 18, 2011, 3:29am

If your data is that easily partitioned into chunks that may be executed on different GPUs, you may as well partition it into more chunks and run then in different streams. That way you can (at least partially) overlap kernel execution from one stream and H<->D copies from other streams.

E.g., partition into NK arrays of size M/(NK), and execute K streams for each of the N GPUs.

Topic		Replies	Views
cudaMemcpyAsync copying back to same array from different streams!! CUDA Programming and Performance	3	1080	May 21, 2014
Memory copy/set async to kernel execution in different stream CUDA Programming and Performance	5	943	December 15, 2022
Question about streams CUDA Programming and Performance	1	980	August 6, 2009
Syncronization with cuda Streams CUDA Programming and Performance cuda	8	418	October 12, 2021
Asynchronous H2D transfer while kernel execution CUDA Programming and Performance	2	5122	April 26, 2011
Asynchronous HtoD memtransfer need to have it asynchronous for cpu, but synchronous for the GPU CUDA Programming and Performance	6	1013	September 9, 2010
asynchronous cuMemcpyDtoD ? CUDA Programming and Performance	9	2398	December 9, 2008
CUDA and NPP Misc Issues CUDA Programming and Performance	6	1450	March 28, 2011
CUDA Streams Overlap Data Transfers CUDA Programming and Performance	2	609	October 24, 2013
asyncAPI sample question CUDA Programming and Performance	9	5038	December 18, 2007

How to use streams for asynch transfers

Related topics