Question about control flow divergence

Suppose that each thread in a block executes the following loop.

//tid is a theadID
for(i=0; i < f(tid); i++) {

}

If 16 threads in a warp execute 8 iterations (f(tid) == 8), and the other 16 execute 10 iterations, which of the followings is true?

  1. until 8th iterations, all threads in a warp run in parallel, and then half of the warp executes remaining 2 iterations in parallel.
    or
  2. all 32 threads are diverged; thread 0 executes 8 iterations, and then thread 1 executes 8, …, thread 31 executes 10 iterations.

(assuming that loop body is somewhat big, and thus predication can not be used.)

To my knowledge “1)” is true! i.e. all threads will go hand in hand for 8 iterations… There will be divergence only after that,

Thanks, but I’m wondering how this happens. This means that runtime system checks control divergence at every branch instruction, and it means that control diverge check routine in in a critical path of GPU H/W. :angel:

Yep, it checks for divergence at every branch instruction. Wild, when you think about it! But remember the hardware and scheduling software has been DESIGNED to do this so it’s extremely efficient.

Nonetheless, you should still avoid divergence if you can, but the nice part is how it Just Works when you do use it.

Thanks a lot.