How does warp execution work with double precision? If I have only 1/32 of my threads in a warp doing double precision work, will only the next three clocks get burnt up before the next instruction comes down the line? Or will the processor finish executing all 32 threads?
Sorry if this is documented somewhere :). I haven’t noticed it.