I am trying to understand how streams are written in CUDA.
1-Basically I am looking for an example, that shows this. Also, I found some code as follows:
cudaStream_t stream1, stream2;
cudaMemcpyAsync( dst, src, size, dir, stream1 );
kernel<<<grid, block, 0, stream2>>>(â€¦);
But I am not able to understand it. In the above stream1 and stream2 are the kernels ?
2- I understand that Streams = sequence of operations that execute in order on GPU. Also streams can be useful because of its ability to concurrently execute a kernel and a memcopy. Suppose I have to do three operations O1, O2, O3, one by one on a chunk of data (in sequence). Now how shall I proceed? Shall I write three different kernels? A pseudo-code here will be helpful for me to understand the concept.
Thanks for your time ,