I’m doing a project where I have to studying quite a bit of CUDA and my supervisor asked me why a warp was 32 threads, and not say 16 or 64. So I have to find out why. I’v been trying to find the reason why it is 32 on and off for a few days. So far, I’v come to the conclusion that:
As the clock frequency for the 32-bit FPUs is twice that of the instruction unit, the FPUs can perform two identical operations in series before the instruction unit has a new, different operation. 8 FPUs * 2 = 16, so warp should be 16 threads… This is of course not the case.
Can anyone help me to understand why the warp consists of 32 threads? Where is my missing factor of 2? Or am I completely off target?