ggeo
September 30, 2014, 7:35am
#1
Hello ,
I saw this unrolling:
int tid = threadIdx.x;
sharedData[ tid ] += sharedData[ tid + 32 ]
sharedData[ tid ] += sharedData[ tid + 16 ]
sharedData[ tid ] += sharedData[ tid + 8 ]
....
The right side doesn’t write in the same value sharedData[ tid ] ?
Thanks
does this not cause a race condition/ data contention…?
should you not use some synchronization (__syncthreads())), or at least write to local memory in the 1st step, and to shared memory in the 2nd step?
ggeo
September 30, 2014, 9:58am
#3
That’s what I am saying.
So , either you must add a __syncthreads() at every line or you must do sth like:
sharedData[ tid + 32 ] += sharedData[ tid + 32 ]
sharedData[ tid + 16 ] += sharedData[ tid + 16 ]
sharedData[ tid + 8 ] += sharedData[ tid + 8 ]
Right?
ggeo
September 30, 2014, 10:04am
#4
I found another similar example ( I don’t remember where I saw it at the first place)
http://www.bu.edu/pasi/files/2011/07/Lecture5.pdf .
In page 22 , why does he have it like this?
In page 30 he has it right , he uses syncthreads().
You don’t need a syncthreads at every line if you are in a warp-synchronous mode and the sharedData pointer is declared volatile.
Please review the cuda parallel reduction tutorial.
http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf
e.g. slides 21-22
Your suggestions e.g.:
sharedData[ tid + 32 ] += sharedData[ tid + 32 ]
don’t make any sense in the context of a parallel reduction.
ggeo
October 1, 2014, 9:29am
#6
You don't need a syncthreads at every line if you are in a warp-synchronous mode and the sharedData pointer is declared volatile.
So , if I don’t declare as volatile , I need syncthreads in each line.
If you can tell why in the link I gave above in page 22 he doesn’t use syncthreads or volatile?
tera
October 1, 2014, 7:55pm
#7
It happened to work by pure chance with older versions of the CUDA toolkit. Since it worked, you would find the code without volatile in quite a few places - IIRC even in Nvidia’s own documentation. And then it suddenly broke with newer compilers…