About divergent warps

auhgnist · September 22, 2009, 1:18pm

I have a question about how branches are dealt with in CUDA. According to the programming guide of CUDA, branches are either predicated or predicated. And divergent branches can happen between warps, otherwise it will hurt performance badly. My question is that is this corresponds to runtime branch resolution OR only analyzed at compile time? If divergent branch happens to the threads within a warp, what exactly is happening? Totally serialized? If totally serialized, coalesced memory accesses of threads are still coalesced or not? Thanks!

LSChien · September 22, 2009, 3:09pm

as far as I know, hardware does not do brach-predicate

if you have two-way if-then-else, say

[codebox]if ( predicate ) then

statement 1

else

statement 2

end

statement 3

[/codebox]

then

step 1: all threads in a warp (32 threads) executes “predicate function” and determine which way each thread should go.

suppose thread 0 ~ 15 has predicate 1 and thread 16~31 has predicate 0, then

step 2: thread 0 ~ 15 execute statement 1

step 3: thread 16 ~ 31 execute statement 2

step 4: thread 0 ~ 31 execute statement 4

step 2 and step 3 are serialized, so “coalesced memory accesses” is restricted in statement 1 or statement 2 respectively.

avidday · September 22, 2009, 3:23pm

I don’t believe that is an accurate description of how branching works. All threads will execute both statements 1 & 2, but the results of the execution are masked out, depending on the state of the predicate evaluation. And in current hardware statement 2 is executed before statement 1.

auhgnist · September 22, 2009, 3:48pm

CUDA Programming Guide v2.2 has explicit statements in Chapter 5.1.1.2, that only certain conditions (i.e., when the branched body is small enough), predications is used. My questions was mainly about serialization. So I think probably what u meant here is right. I have another question that for loops such as for or while which may based on a dynamic termination condition, how serialization happens? Thanks!

Topic		Replies	Views
Branch divergence and executing serial could be misinterpretted. CUDA Programming and Performance	8	3933	December 21, 2016
What should I optimize first? Divergence? Serialized Warps? CUDA Programming and Performance	4	7229	April 7, 2009
coalescing problem CUDA Programming and Performance	4	1064	August 8, 2011
Wacking the CUDA performance Is this how you can screw up you CUDA CUDA Programming and Performance	16	21235	March 12, 2007
Avoid branching ... CUDA Programming and Performance	3	3602	May 19, 2010
[Solved] PTX ISA predicated execution and the warp divergence issue CUDA Programming and Performance	6	2981	January 14, 2014
Loops in kernels CUDA Programming and Performance	2	1321	September 3, 2009
Question about divergence and branch granularity CUDA Programming and Performance	1	882	April 25, 2012
Do non-coalesced memory accesses cause branch divergence? CUDA Programming and Performance	0	621	September 30, 2013
confusion with warp selection CUDA Programming and Performance	9	1002	June 14, 2011

About divergent warps

Related topics