I want to synchronize CUDA streams

choikang · January 4, 2024, 4:34am

I am currently performing asynchronous operations using five streams in CUDA.
I want to complete the code in a specific order: Memcpy - cudaMemcpy (Host to Device) - Kernel - cudaMemcpy (Device to Host) - Memcpy. I am encountering an issue where the cudaMemcpy (Device to Host) needs to be completed before I can access the destination address in the last Memcpy. I tried using stream synchronization, but encountered a problem where it breaks after one cycle.
Is there a way to stop each stream while maintaining asynchronous behavior, ensuring that cudaMemcpy (Device to Host) is completed before accessing Memcpy on the CPU?

The following is the result of monitoring the code with errors using Nsight.

Robert_Crovella · January 4, 2024, 2:01pm

The general stream method to ensure that operation B does not begin until operation A is complete is to launch them into the same stream.

You can also look at cudaStreamWaitEvent() as another option.

Robert_Crovella · January 4, 2024, 4:40pm

choikang · January 5, 2024, 1:52am

I’ve tried almost everything there, but the issue persists. Asynchronous execution seems to work, but at the point of Stream synchronization, it appears as if the GPU is waiting for all Streams to complete. Should each Stream exist in a different thread.
I’m really curious.

choikang · January 5, 2024, 2:24am

I just resolved it using a callback function and cudaLaunchHostFunc ! Thank you very much

system · January 19, 2024, 2:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about streams CUDA Programming and Performance	1	980	August 6, 2009
Asyncronus call CUDA Programming and Performance	1	2254	September 24, 2009
cudaMemcpyAsync clarification required & help needed CUDA Programming and Performance	0	1749	October 17, 2009
Syncronization with cuda Streams CUDA Programming and Performance cuda	8	418	October 12, 2021
Concurrent copy & execution problem Device to host memory copy is not overlapped with kernel exe CUDA Programming and Performance	1	1761	June 23, 2010
Issue on parallelising memcpy CUDA Programming and Performance cuda	3	449	August 26, 2022
Event Synchronization CUDA Programming and Performance	6	1475	February 8, 2019
Best way to synchronise two stream from different gpus CUDA Programming and Performance	3	1048	August 22, 2022
About the behavior of cudaStreamSynchronize() CUDA Programming and Performance cuda	3	2228	April 25, 2023
Running streams parallel with the host functions CUDA Programming and Performance	5	1326	November 27, 2018

I want to synchronize CUDA streams

Related topics