CUDA Reduction

disabled · March 1, 2009, 11:06am

hello everybody

Ive just started programming with CUDA and dont understand some things so i will ask you guys. There is popular reduction algorithm like this:

for(unsigned int s=blockDim.x/2; s>0; s>>=1) 
{
    if (tid < s) 
    {
        sdata[tid] += sdata[tid + s];
    }
    __syncthreads();
}

And i dont really understand why it hasn’t got bank conflicts. Lets say i have got 64 int numbers array copied into sdata. If divide by 2 i have got 32 threads working on it. It makes 1 wrap full couse 32 threads in wrap. But element 0 and 16 in this array belongs to the same bank (bank 0) so should be bank conflict when thread 0 and 16 reads the same data? I guess there is something wrong in my way of thinking so im asking for explanation.

SPWorley · March 1, 2009, 12:53pm

Good question.

The answer is easy: memory accesses to shared and global memory both are dealt with by HALF warps.
For shared memory bank conflicts, you’ll get conflicts when two threads in the same half-warp both try to read from the same bank.
In your example, thread 0 and thread 16 do indeed read from the same bank, but they’re in different half-warps, so you’re OK.

disabled · March 1, 2009, 12:57pm

Ok, now i understand.
Thanks for help.

Topic		Replies	Views
Problems Understanding Bank Conflicts CUDA Programming and Performance	1	1712	September 16, 2009
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2611	March 31, 2010
writing to an array of 64 ints CUDA Programming and Performance	4	2336	March 3, 2008
Help understanding bank conflicts in transpose example CUDA Programming and Performance	5	6622	February 8, 2009
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	892	December 29, 2011
do not understand bank conflicts please help CUDA Programming and Performance	7	2685	December 22, 2012
Does this have bank conflict? CUDA Programming and Performance	3	1527	October 31, 2008
shared memory bank conflicts when reading? CUDA Programming and Performance	5	2545	August 3, 2007
Bank Conflicts CUDA Programming and Performance	2	1956	December 6, 2009
confusion about 64 bit shared memory access CUDA Programming and Performance	1	1249	May 10, 2012

CUDA Reduction

Related topics