Blocks and Warps

krinitsa · July 28, 2011, 12:13pm

Hey guys

my question is about Blocks and Warps , i managed to understand that within each SM , under G8 for example , we have room for 8 blocks.
when each block is executed it’s basically splitted into warps when each wrap contains 32 Threads , tops.

according to all CUDA documintation , all threads within certain warp preform the same insturction.
my question is , how exactlly that warp is built and how can we be sure that each thread would execute the same instruction before we managed to get into “execution” mode ?
e.g

if ( C )
A
else
B

edit : my question is related also to the branch diverengece field.

thanks igal.

seibert · July 28, 2011, 8:42pm

Warps are created from threads in “thread ID” order, which is explained in Section 2.2 of the CUDA Programming Guide. You do not need to worry about correct handling of branch instructions because the compiler takes care of that for you.

Ken_Domino · July 29, 2011, 12:14am

The first thing you need to understand is how warps and branch divergence work at the level of the GPU. You should read the following:

Coon, B. W. and J. E. Lindholm (2008). United States Patent #7,353,369: System and Method for Managing Divergent Threads in a SIMD Architecture (Assignee NVIDIA Corp.), April 2008., U.S.P.T.O.

Fung, W. W. L., I. Sham, et al. (2007). Dynamic warp formation and scheduling for efficient gpu control flow, IEEE Computer Society.

In particular, the explanation of the example associated with Figures 6A, B, C in the patent is very helpful in understanding the basic automaton in an SM.

Once you understand that, you can then start trying to understand how things work at a higher level, like the CUDA API’s. None of the documents on CUDA from NVIDIA (i.e., CUDA Programming Guide, NVIDIA Compute, PTX: Parallel Thread Execution, Programming Massively Parallel Processors, CUDA by Example) seem to describe how things work at this level and only left me asking more questions.

Ken

Topic		Replies	Views
CUDA hardware level: Streaming Multiprocessor CUDA Programming and Performance	1	2634	April 27, 2015
How do CUDA cores on a SM execute warps concurrently? CUDA Programming and Performance	8	28635	July 4, 2019
difference between a block and a warp ? CUDA Programming and Performance	3	10958	February 24, 2009
Each thread working concurrently ? CUDA Programming and Performance	5	1117	March 2, 2010
About Warps how Warps are allocated to SP/SM CUDA Programming and Performance	2	8305	September 11, 2009
Interactions among blocks CUDA Programming and Performance	11	11461	February 6, 2010
thread, warp, block, grid, device CUDA Programming and Performance	3	6210	November 25, 2016
CUDA execution mapping onto GPUs CUDA Programming and Performance	0	2818	March 2, 2009
Blocks and Warps CUDA Programming and Performance	2	8064	January 7, 2009
CUDA threads and warps Teaching and Curriculum Support	3	7838	May 12, 2015

Blocks and Warps

Related topics