Loops in kernels

The warp will loop for as long as there are any threads still in the loop. It won’t serialize them. So in your example, it will take just 31 iterations, assuming this warp’s tid values range from 0 to 31. (Careful, since threadIdx.x will go up to the number of threads in the block, not just the number of threads in the warp.)