Question regarding cudaThreadSynchronize() Does it act like a barrier?

bugBot · September 16, 2008, 1:33am

I read a few posts regarding cudaThreadSynchronize and what I can make out of it is that it waits until all threads finish execution in the kernel. For eg,

call kernel1 <<<grid,block>>>(…)
cudaThreadSynchronize();
// Here all threads are finished and device is ready.

But isn’t there an implicit barrier after all kernel invocation? Isn’t cudaThreadSynchronize() redundant to have after a kernel call?

Having said that, If I don’t have cudaThreadSynchronize() I do not get proper timing. Why is that? Can someone explain me the behaviour?

Thanks in advance.

Ailleur · September 16, 2008, 1:40am

Kernel calls are async so that your host can do some work while the gpu also works away.
There is an implicit barrier when you want to do a memcpy to bring the results back on the host.

Topic		Replies	Views
Waiting for kernel CUDA Programming and Performance	6	1469	September 8, 2010
cudaThreadSynchronize CUDA Programming and Performance	1	2394	February 1, 2009
Using GPU and CPU at the same time CUDA Programming and Performance	5	6955	March 4, 2009
Thread sync CUDA Programming and Performance	2	794	May 9, 2011
cudaThreadSynchronize() after kernel call? CUDA Programming and Performance	5	11484	November 29, 2010
Asyncronus call CUDA Programming and Performance	1	2256	September 24, 2009
cudaThreadSynchronize() CUDA Programming and Performance	1	2224	July 11, 2007
"cudaThreadSynchronize()" and "__syncthreads()" CUDA Programming and Performance	1	9738	March 22, 2008
About the behavior of cudaStreamSynchronize() CUDA Programming and Performance cuda	3	2897	April 25, 2023
cudaThreadSynchronize() stalls? CUDA Programming and Performance	2	8974	January 8, 2008

Question regarding cudaThreadSynchronize() Does it act like a barrier?

Related topics