The warp will loop for as long as there are any threads still in the loop. It won’t serialize them. So in your example, it will take just 31 iterations, assuming this warp’s tid values range from 0 to 31. (Careful, since threadIdx.x will go up to the number of threads in the block, not just the number of threads in the warp.)
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Does CUDA support variable loop limits? | 2 | 1217 | October 12, 2011 | |
thread local 'for loop' question thread parallel for loop execution | 5 | 3397 | August 29, 2007 | |
Question about divergence and loops | 7 | 7086 | November 21, 2010 | |
Question about control flow divergence | 4 | 7324 | July 24, 2008 | |
Questions about control structure | 1 | 901 | June 17, 2010 | |
Warp Serialize | 1 | 2712 | November 4, 2008 | |
Avoid branching ... | 3 | 3616 | May 19, 2010 | |
about divergent branches | 1 | 2103 | March 24, 2008 | |
Must all threads execute the same code? "Branch divergence occurs only within a warp" | 5 | 2975 | December 28, 2008 | |
performance gain by "killing" warps can there be any? | 5 | 2274 | February 12, 2009 |