Doesn't this write to the same thread?

ggeo · September 30, 2014, 7:35am

Hello ,

I saw this unrolling:

int tid = threadIdx.x;

sharedData[ tid ] += sharedData[ tid + 32 ]
sharedData[ tid ] += sharedData[ tid + 16 ]
sharedData[ tid ] += sharedData[ tid + 8 ]
....

The right side doesn’t write in the same value sharedData[ tid ] ?

Thanks

little_jimmy · September 30, 2014, 9:26am

does this not cause a race condition/ data contention…?

should you not use some synchronization (__syncthreads())), or at least write to local memory in the 1st step, and to shared memory in the 2nd step?

ggeo · September 30, 2014, 9:58am

That’s what I am saying.

So , either you must add a __syncthreads() at every line or you must do sth like:

sharedData[ tid + 32 ] += sharedData[ tid + 32 ]
sharedData[ tid + 16 ] += sharedData[ tid + 16 ]
sharedData[ tid + 8 ] += sharedData[ tid + 8 ]

Right?

ggeo · September 30, 2014, 10:04am

I found another similar example ( I don’t remember where I saw it at the first place)
[url]http://www.bu.edu/pasi/files/2011/07/Lecture5.pdf[/url].

In page 22 , why does he have it like this?
In page 30 he has it right , he uses syncthreads().

Robert_Crovella · September 30, 2014, 3:46pm

You don’t need a syncthreads at every line if you are in a warp-synchronous mode and the sharedData pointer is declared volatile.

Please review the cuda parallel reduction tutorial.

http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf

e.g. slides 21-22

Your suggestions e.g.:

sharedData[ tid + 32 ] += sharedData[ tid + 32 ]

don’t make any sense in the context of a parallel reduction.

ggeo · October 1, 2014, 9:29am

You don't need a syncthreads at every line if you are in a warp-synchronous mode and the sharedData pointer is declared volatile.

So , if I don’t declare as volatile , I need syncthreads in each line.

If you can tell why in the link I gave above in page 22 he doesn’t use syncthreads or volatile?

tera · October 1, 2014, 7:55pm

It happened to work by pure chance with older versions of the CUDA toolkit. Since it worked, you would find the code without volatile in quite a few places - IIRC even in Nvidia’s own documentation. And then it suddenly broke with newer compilers…

ggeo · October 2, 2014, 7:20am

Ok ,thanks for the info!

Topic		Replies	Views
Why does single warp need syncthreads? CUDA Programming and Performance	2	1975	January 24, 2012
32 thread block doesn't need _syncthreads()? CUDA Programming and Performance	18	13915	January 21, 2024
are threads of a warp really sync? CUDA Programming and Performance	2	839	August 3, 2011
Is syncthreads required within a warp? CUDA Programming and Performance	10	12663	November 8, 2013
Parallel Reduction with shared memory CUDA Programming and Performance	7	1145	January 17, 2020
__syncthreads thread syncronization CUDA Programming and Performance	7	18759	October 27, 2009
Some questions about thread synchronization CUDA Programming and Performance cuda	4	314	April 20, 2024
warp synchronization test CUDA Programming and Performance	5	1741	September 2, 2014
Race condition within warp CUDA Programming and Performance	9	3160	September 20, 2016
using syncthreads still at n00b status CUDA Programming and Performance	4	16089	December 1, 2010

Doesn't this write to the same thread?

Related topics