I read that CUDA serializes the execution of divergent branches within a thread warp, and I am a little confused as to what this means. If threads in a warp contain FOR loops of different lengths, do they synchronize execution of the iterations they have in common, which results in all threads taki…

Loops in kernels

Accelerated Computing CUDA CUDA Programming and Performance

SPWorley September 2, 2009, 11:54pm 2

The warp will loop for as long as there are any threads still in the loop. It won’t serialize them. So in your example, it will take just 31 iterations, assuming this warp’s tid values range from 0 to 31. (Careful, since threadIdx.x will go up to the number of threads in the block, not just the number of threads in the warp.)

Topic		Replies	Views
Does CUDA support variable loop limits? CUDA Programming and Performance	2	1217	October 12, 2011
thread local 'for loop' question thread parallel for loop execution CUDA Programming and Performance	5	3397	August 29, 2007
Question about divergence and loops CUDA Programming and Performance	7	7086	November 21, 2010
Question about control flow divergence CUDA Programming and Performance	4	7324	July 24, 2008
Questions about control structure CUDA Programming and Performance	1	901	June 17, 2010
Warp Serialize CUDA Programming and Performance	1	2712	November 4, 2008
Avoid branching ... CUDA Programming and Performance	3	3616	May 19, 2010
about divergent branches CUDA Programming and Performance	1	2103	March 24, 2008
Must all threads execute the same code? "Branch divergence occurs only within a warp" CUDA Programming and Performance	5	2975	December 28, 2008
performance gain by "killing" warps can there be any? CUDA Programming and Performance	5	2274	February 12, 2009

Loops in kernels

Related topics