Divergent warps Divegent warps

zmha · October 28, 2011, 12:44pm

Hello,
Im reading “Optimizing parallel reduction in cuda” by Mark Harris
http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/reduction/doc/reduction.pdf
and im tring to understand why “highly divergent warps are very inefficient”, can you please advise?

The quote is regarding the next code:

int tid=threadIdx.x;
for (int s=1;s<blockDim.x;S*=2){
if(tid%(2*s)==0)
sdata[tid]+=sdata[tid+s];
__syncthreades();
}

Thanks

Jimmy_Pettersson · October 28, 2011, 2:08pm

It means that operations are serialized and you are unable to perform operations in a SIMD fashion. So with 16 FPUs operating in SIMD ( executing one warp (32 threads) in 2 clock cycles ) i guess you would get 1/16 the performance with complete serialization. CUDA gives the impression of each thread being completely scalar ( superscalar? ) but in fact one often want to consider not to branch a warp to much.

This is one of the downsides of SIMD but can often be alleviated by for example the use of ternary operators. I also believe the compiler uses branch “predication” for shorter if else statements which supposedly helps.

zmha · October 30, 2011, 10:06am

Thanks!

Topic		Replies	Views
Cost of serialization. The cost of wrap execution serialization CUDA Programming and Performance	5	7154	July 9, 2008
How many divergent branches can actually be discussed in parallel? CUDA Programming and Performance	5	3087	October 1, 2009
Diverge-free doesn't win 32x over Diverge-all warp divergence CUDA Programming and Performance	6	3168	September 14, 2007
Can CUDA be useful for me? CUDA Programming and Performance	4	2595	June 12, 2008
Warp Serialize CUDA Programming and Performance	1	2745	November 4, 2008
branching and SIMD processor serialization vs predication CUDA Programming and Performance	7	10784	October 26, 2007
Branch Divergence Serialization (Threads/hardware stalls ?) Performance Impact ? Branch divergence s CUDA Programming and Performance	3	1636	June 15, 2011
Wacking the CUDA performance Is this how you can screw up you CUDA CUDA Programming and Performance	16	21337	March 12, 2007
What should I optimize first? Divergence? Serialized Warps? CUDA Programming and Performance	4	7268	April 7, 2009
reduction optimization #1 Not agree with performances explanation CUDA Programming and Performance	8	6755	August 1, 2008

Divergent warps Divegent warps

Related topics