Why only half-warp?

seibert · April 15, 2010, 7:45pm

The stream processors are pipelined, so in fact many warps are in various stages of execution at any given time. The job of the scheduler on the multiprocessor is to grab warps that are not waiting on global memory reads and stuff them into the pipeline to begin executing their next instruction. Although a multiprocessor can complete an entire warp instruction (with some exceptions) every 4 clock cycles, it in fact takes many more than 4 clock cycles for a given warp instruction from beginning to end.

Every modern CPU works this way, except single-threaded code is much more likely to have “pipeline hazards”, where the next instruction in the thread depends on the one before it in such a way that you can’t stuff it into the pipeline next. By encouraging large numbers of independent instructions (i.e., threads don’t usually talk to each other), a CUDA device can keep pipelines full without all the instruction reordering fanciness (and therefore transistor cost) of a CPU.

Topic		Replies	Views
Basic question about warps CUDA Programming and Performance	14	6542	June 9, 2009
Stupid (?) questions about Warp vs. Half Warp vs. SM width CUDA Programming and Performance	3	43735	November 12, 2010
GPU architecture and CUDA kernel execution CUDA Programming and Performance	13	24796	September 6, 2009
CUDA Use Cases run serial algorithms on composite data CUDA Programming and Performance	14	4476	October 24, 2008
questions about sp and sm CUDA Programming and Performance	5	3872	June 19, 2019
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2144	March 19, 2011
Simple summary of CUDA execution model An attempt to simplify and summarize various sources on execu CUDA Programming and Performance	7	5541	July 28, 2009
"Half-warps", scheduling, and branch divergence CUDA Programming and Performance	3	4270	February 24, 2013
Half WRAP -- NEWBIE help CUDA Programming and Performance	7	5630	November 4, 2008
A question about the correspondence between warp and core CUDA Programming and Performance	17	7702	February 1, 2019

Why only half-warp?

Related Topics