are threads of a warp really sync?

buse385 · August 2, 2011, 10:09am

Hi all,
There are 2 kernels in the attachment. They’re from a presentation of an Nvidia guy. Both do the same work: they sum the values up in an array. The second kernel unrolls the loo for the last warp. Since threads of a warp works synchronously nvidia guys suggested that we don’t need any __syncthreads(). The second kernel however does not sum correctly. I fixed the code by adding __syncthreads(); after each statement in the unrolled part and it worked correctly. Also I came up with another solution without __syncthreads();. Astonishingly it works. Now I’m quite confused:

Are the threads of a warp really work synchronously? If so, how come the original code doesn’t work?
Without __syncthreads(); how can my solution work?

kernel2.cu (2.53 KB)

MarkusM · August 3, 2011, 8:21am

The original code was correct on the old G80 architecture but it isn’t anymore on Fermi, because the compiler is doing more agressive optimizations. If shared stores/reads aren’t separated by __syncthreads() the compiler might decide to hold intermediate values in registers instead of writing them to shared memory. The standard fix for this avoiding the synchronization is to mark the shared array as volatile. (For more details see the Fermi Compatibility Guide, chapter 1.3.3)

buse385 · August 3, 2011, 8:24am

Thanks a lot for the help MarkusM External Image

Topic		Replies	Views
Why does single warp need syncthreads? CUDA Programming and Performance	2	1975	January 24, 2012
warp synchronization test CUDA Programming and Performance	5	1741	September 2, 2014
Is syncthreads required within a warp? CUDA Programming and Performance	10	12663	November 8, 2013
Race condition within warp CUDA Programming and Performance	9	3160	September 20, 2016
is syncthreads needed when will divergent threads in same warp re-sync CUDA Programming and Performance	9	3370	January 23, 2012
32 thread block doesn't need _syncthreads()? CUDA Programming and Performance	18	13915	January 21, 2024
syncronize a warp CUDA Programming and Performance	8	2906	August 25, 2008
Cuda: threads over 2 warps not synchronising correctly Legacy PGI Compilers	5	6957	May 26, 2011
Doesn't this write to the same thread? CUDA Programming and Performance	7	1284	October 2, 2014
parallel scan without syncthreads CUDA Programming and Performance	11	7250	November 2, 2010

are threads of a warp really sync?

Related topics