Why bundle threads (into warps)?


Can anybody tell me the advantages of bundling the threads in warps? And why the warp size is 32 anyways?

It’s related to how SIMD works. Google it up if you don’t know.

The size of 32 is an effect of a particular hardware implementation. To learn why 32 and not ex. 16, one would have to know the hardware very intimately.

Each MP (Multi Processor) executes 32 threads at-once, 8 threads per cycle (and so 4-cycles to executes one instructions for a warp of 32 threads).

There’s no advantage but by hardware design, each thread on a warp execute the same instruction at each cycle.

In other words, you only have 1 Program-counter per warp, that’s why 32 threads are grouped in 1 warp.

(oups! posted twice!!!)