performance cost due to for loop divergence

lang · July 15, 2014, 1:25pm

for(i=0;i<data(threadid);i++){
…
}

when the loop within a thread will end depends on data the thread accesses.

Will the divergence affect performance seriously? From my point of view, it won’t because in the divergence, some threads stop and others continue. These are not exact “two” branches.

Am I right?
Thanks!

cbuchner1 · July 15, 2014, 2:07pm

the stopped threads are wasting computational resources by leaving some GPU hardware (ALUs, etc) unused. That is unless all 32 threads of a warp have stopped - in this case there is no performance impact because the entire warp is no longer being scheduled.

lang · July 15, 2014, 2:21pm

Though wastes computational resources with stopped threads, this case will not be worse than all threads running, right?
I mean the time for the threads to stop not mean the total throughput.

cbuchner1 · July 15, 2014, 3:09pm

when currently 50% of your threads are doing work, and these active threads are dispersed randomly across all warps, then your instantaneous throughput cannot exceed 50% of peak.

Topic		Replies	Views
Performance of Divergent Threads CUDA Programming and Performance	2	1636	September 29, 2008
performance gain by "killing" warps can there be any? CUDA Programming and Performance	5	2268	February 12, 2009
Thread divergence due to IF CUDA Programming and Performance	3	6853	September 13, 2007
Difference between Thread Divergence and Warp Divergence CUDA Programming and Performance	3	9090	September 7, 2018
thread local 'for loop' question thread parallel for loop execution CUDA Programming and Performance	5	3388	August 29, 2007
Thread Divergence CUDA Programming and Performance	2	2730	September 27, 2008
Must all threads execute the same code? "Branch divergence occurs only within a warp" CUDA Programming and Performance	5	2939	December 28, 2008
Question about divergence and loops CUDA Programming and Performance	7	7071	November 21, 2010
Diverge-free doesn't win 32x over Diverge-all warp divergence CUDA Programming and Performance	6	3114	September 14, 2007
If loops in kernel a problem? CUDA Programming and Performance	3	1744	February 26, 2009

performance cost due to for loop divergence

Related topics