sanity check: when do I need to synchronize kernel launches?

svennevs · February 2, 2018, 2:11am

Consider a simple three step procedure:

// create event A
cudaMemcpyAsync(d_depthData, h_depthData, numBytes, cudaMemcpyHostToDevice, stream);
// sync event A
// create event B
// using the depth data we just copied, compute the positions
computePositionsKernel<<< grid, block, 0, stream >>>(d_depthData, d_positions);
// sync event B
// using the positions we just computed, compute the normals
computeNormalsKernel<<< grid, block, 0, stream >>>(d_positions, d_normals);

Does “event A” need to be synchronized? I do an async copy because I want everything to operate on this stream, and I cannot find the original source, but I distinctly remember reading or watching a video that explained that if you do an asynchronous copy, the next kernel to use that destination implicitly synchronizes until the data is ready. Is that true? None of the official documentation seems to indicate this behavior at all.
If (1) is does not need to be synchronized, does event B need to be synchronized?

I’ve removed synchronization code and everything works the same, but I feel like this is actually just a limitation of my (low end) GPU not actually being able to run concurrent kernels.

If there is a data-dependence between two kernels, is it correct to assume that I should ALWAYS be synchronizing?

Thank you for any sanity checks, I want to make sure my code works for people who have real GPUs as well ;)

Robert_Crovella · February 2, 2018, 6:21pm

an event needs to be recorded, not just created, in order to use it in any way.

all CUDA activity issued to a particular stream will serialize. Always.

[url]Programming Guide :: CUDA Toolkit Documentation

“A stream is a sequence of commands (possibly issued by different host threads) that execute in order.”

Topic		Replies	Views
Question about CUDA streams CUDA Programming and Performance	8	850	November 8, 2019
Fail to sync the cudaMemcpyAsync using the cudaEvent in two streams CUDA Programming and Performance	4	309	April 1, 2024
cudaStreamSynchronize(a_stream) simpleStreams CUDA Programming and Performance	2	2436	December 2, 2010
Overlap cudaMemcpyAsync and kernel CUDA Programming and Performance	1	533	February 10, 2021
Overlapping CPU and GPU code. CUDA Programming and Performance	6	1690	February 27, 2016
CUDA stream management CUDA Programming and Performance	1	497	December 15, 2016
Do i really need to use cudaDeviceSynchronize in this scenario ? CUDA Programming and Performance	2	1078	February 11, 2019
Why some synchronize function make cudaMemcpyAsync and kernal in different stream work in sequential CUDA Programming and Performance	2	6602	March 1, 2011
Syncronization with cuda Streams CUDA Programming and Performance cuda	8	518	October 12, 2021
asyncAPI sample question CUDA Programming and Performance	9	5147	December 18, 2007

sanity check: when do I need to synchronize kernel launches?

Related topics