Can someone help me understand how the following example used everywhere to go from divergent code to non-divergent code? If there were 16 threads in a block, In the first if statement, only threads 0 and 1 will execute the body of the if statement. However, in the second if statement, it looks like all the threads will execute the body of the if statement. How can this work if I only wanted selective threads in a block to copy something from the device memory to the shared memory? Also, is there an implicit barrier at the end of the if statement?
if (threadIdx.x < 2) {
}
is the same as
if (threadIdx.x/WARP_SIZE < 2) {
// do something
}