"Half-warps", scheduling, and branch divergence

seibert · February 24, 2013, 6:06pm

No, divergence is a problem if you branch at the half-warp level. When a warp instruction is executed on compute capability 2.x, the entire warp is sent to 16 of the CUDA cores. CUDA cores are pipelined (as are all modern CPUs), so what happens is that the 32 threads are queued up in two consecutive pipeline stages in those 16 CUDA cores. The instruction takes something like 16 to 24 clock ticks to propagate through the pipeline, but because many instructions moving through the pipeline at once, that group of 16 CUDA cores will complete one warp every two clock ticks.

If you aren’t familiar with pipelining in computer architecture, this is a reasonable summary:

[url]http://en.wikipedia.org/wiki/Pipeline_(computing)[/url]

Topic		Replies	Views
warp scheduler of Fermi architecture CUDA Programming and Performance	2	3283	February 5, 2012
How many divergent branches can actually be discussed in parallel? CUDA Programming and Performance	5	3120	October 1, 2009
Basic question about warps CUDA Programming and Performance	14	6781	June 9, 2009
Branch divergence and executing serial could be misinterpretted. CUDA Programming and Performance	8	4083	December 21, 2016
Can CUDA be useful for me? CUDA Programming and Performance	4	2617	June 12, 2008
Avoid branching ... CUDA Programming and Performance	3	3682	May 19, 2010
Any need to revise the principle "Threads in a half-warp are SIMT synchronous" ? CUDA Programming and Performance	1	724	July 30, 2013
Threads Dispatching : 2 different instructions per cycles? CUDA Programming and Performance	2	118	January 31, 2025
execution within one diverged warp CUDA Programming and Performance	2	534	February 21, 2020
Why only half-warp? CUDA Programming and Performance	6	12880	April 15, 2010

"Half-warps", scheduling, and branch divergence

Related topics