cuda stream

heshsham_India · November 19, 2009, 9:02am

Hi,

I am trying to understand how streams are written in CUDA.

1-Basically I am looking for an example, that shows this. Also, I found some code as follows:

cudaStream_t stream1, stream2;
cudaStreamCreate(&stream1);
cudaStreamCreate(&stream2);
cudaMemcpyAsync( dst, src, size, dir, stream1 );
kernel<<<grid, block, 0, stream2>>>(â€¦);

But I am not able to understand it. In the above stream1 and stream2 are the kernels ?

2- I understand that Streams = sequence of operations that execute in order on GPU. Also streams can be useful because of its ability to concurrently execute a kernel and a memcopy. Suppose I have to do three operations O1, O2, O3, one by one on a chunk of data (in sequence). Now how shall I proceed? Shall I write three different kernels? A pseudo-code here will be helpful for me to understand the concept.

Thanks for your time ,

Heshsham

avidday · November 19, 2009, 9:39am

No they are streams. Streams could be thought of as command pipelines. You can have several open to the same device at once and push asynchronous commands down different streams. On GPUs with concurrent copy/execution capability, the driver will work out when commands on different streams can be overlapped and execute them accordingly. In your example code, one stream is being used for an asynchronous copy, and the other to run a kernel. Both can run at the same time to improve computational efficiency and hide PCI-e latency.

That would be the usual approach. If at the end of the O1-O2-O3 sequence, your host code needed to compute something based on the intermediate results of O2, then it would make sense to use streams and do an asynchronous copy back to the host while O3 was still running. If the calculations needs the result of O3 on the host, then you have no choice to wait until O3 is finished, and streams probably wouldn’t be of any benefit.

The principles are discussed in section 3.2.6 of the programming guide.

Sarnath · November 19, 2009, 10:18am

Even in the normal case, when there are no streams, there exists a default stream.

Stream basically means that ALL operations initiated are served on FIFO basis.

Thus a code sequence like:

cudaMalloc()

cudaMemcpy(TO_GPU)			  ---------------- REF_1

kernel1 <<< >>>

cudaMemcpy(FROM_GPU)		  ---------------- REF_2

cudaMemcpy(TO_GPU_FOR_NEXT_KERNEL) --- REF_3

kernel2 <<< >>>

will execute FIFO manner…

So, the REF_3 cudaMemcpy will have to wait for all pevious operations (including REF_1, REF_2) to complete… This is Normal…

But there are some cards out there which can support concurrent kernel execution and memcpy… For such cards, “kernel1” can execute and at the same time memcpy in “REF_3” can execute…

So, There needs to be a way to express this parallelism without disturbing older semantics…

And thus, CUDA Streams was born… HTH

Arip · April 6, 2011, 5:32am

Even in the normal case, when there are no streams, there exists a default stream.

Stream basically means that ALL operations initiated are served on FIFO basis.

Thus a code sequence like:
cudaMalloc()

cudaMemcpy(TO_GPU)			  ---------------- REF_1

kernel1 <<< >>>

cudaMemcpy(FROM_GPU)		  ---------------- REF_2

cudaMemcpy(TO_GPU_FOR_NEXT_KERNEL) --- REF_3

kernel2 <<< >>>
will execute FIFO manner…

So, the REF_3 cudaMemcpy will have to wait for all pevious operations (including REF_1, REF_2) to complete… This is Normal…

But there are some cards out there which can support concurrent kernel execution and memcpy… For such cards, “kernel1” can execute and at the same time memcpy in “REF_3” can execute…

So, There needs to be a way to express this parallelism without disturbing older semantics…

And thus, CUDA Streams was born… HTH

Hi,

If I put a cudaThreadSynchronize() after kernel and cudaMemcpyAsync() in REF 2, then this GPU-CPU transfer will wait for the kernel or it can do asynchronus memcpy even now ??

Topic		Replies	Views
My streams are not running concurrently CUDA Programming and Performance	7	1775	March 6, 2018
confusions about CUDA streams CUDA Programming and Performance	5	805	July 30, 2017
Processing Order with Cuda Streams in 7.5 CUDA Programming and Performance	13	1989	June 24, 2016
CUDA and NPP Misc Issues CUDA Programming and Performance	6	1451	March 28, 2011
CUDA stream CUDA Programming and Performance	1	4651	April 11, 2010
Concurrent Kernel executions Concurrent Kernel executions on same CPU thread and multiple CPU threa CUDA Programming and Performance	2	4170	August 25, 2011
Understanding Streams I'm confused. :( CUDA Programming and Performance	2	728	May 2, 2011
async memcopy/kernel from different contexts overlaping operations from different contexts.. CUDA Programming and Performance	9	2949	December 18, 2008
streams vs. direct use of zero copy memory CUDA Programming and Performance	14	13126	March 30, 2011
multi task parallelization with cuda streams ? CUDA Programming and Performance	7	1457	September 14, 2017

cuda stream

Related topics