CUDA stream management

sungsoo · December 12, 2016, 11:16pm

Hi All,

I am trying to run multiple kernels simultaneously as follows:

void* buf_a, buf_a1, buf_b, buf_b1;
// ... initialize buf_a and buf_b ...
cudaDeviceSynchronize();

for (int i=0; i<maxIter; ++i)
{
     computeKernelA<<<,,,stream1>>>(someout_a, buf_a);
     computeKernelB<<<,,,stream1>>>(someout_b, buf_b, someout_a);
     // ... do something with someout_b ...
     bufKernelA<<<,,,stream2>>>(buf_a1);
     bufKernelB<<<,,,stream3>>>(buf_b1);
     swap(buf_a1, buf_a);
     swap(buf_b1, buf_b);
}

First, buf_a and buf_b will be initialized. Within the loop, computeKernelA will be invoked with stream1 to do some process on the data stored in buf_a. Then, stream1 is again used to invoke computeKernelB because there is data dependancy. In the meantime, stream2 and stream3 will run bufKernelA and bufKernelB, respectively, to fetch some data in advance for next iteration.

The simplest way to do this correctly is put cudaDeviceSynchronize() before calling swap functions. But, it causes very long latency before starting next iteration. Also, I want to execute computeKernelA for next iteration as soon as bufKernelA on stream2 is completed; while computeKernelB for next iteration need to wait for the result from computeKernelA as well as the completion of bufKernelB on stream3 invoked in previous iteration.

Can I do this without using cudaDeviceSynchronize()?

Thanks!

ygsunshine · December 15, 2016, 8:42pm

Yes. You should use the CUDA events to synchronize between your streams. Please read about cuda streams and you will know what to do:
[url]http://on-demand.gputechconf.com/gtc/2014/presentations/S4158-cuda-streams-best-practices-common-pitfalls.pdf[/url]

Topic		Replies	Views
Using streams... Howto? CUDA Programming and Performance	0	1112	July 25, 2008
Question about CUDA streams CUDA Programming and Performance	8	737	November 8, 2019
Multi-GPU & stream management. CUDA Programming and Performance	2	912	October 12, 2013
Do i really need to use cudaDeviceSynchronize in this scenario ? CUDA Programming and Performance	2	1021	February 11, 2019
How to synchronize between two kernels using CUDA? CUDA Programming and Performance	2	72	November 23, 2024
Async start kernel in different stream after another completes? CUDA Programming and Performance	2	590	April 4, 2016
Got wrong result when not using cudaDeviceSynchronize in threads CUDA Programming and Performance	6	838	February 1, 2024
Concurrent kernel execution CUDA Programming and Performance	2	333	March 26, 2024
On implicit synchronization of streams on separate devices CUDA Programming and Performance	2	427	March 16, 2020
cuFFT synchronizing CUDA Programming and Performance	2	999	November 18, 2019

CUDA stream management

Related topics