I’m a CUDA beginner and I am currently reading the Programming Guide. In the section “Control Flow Instructions” (5.4.2) I found the following paragraph:
I don’t understand, why the code does not branch when using the sample condition. How are the threads scheduled? Does the scheduler select only these threads to execute in a warp, where (threadIdx / warpSize) is equal?