Relationship between Warp, MP, Block, Shared Memory

jmleehs · March 29, 2010, 11:53am

Hi, Friend!

I am a beginer for CUDA.
As I know, a warp is the number of threads to be executed concurrently by a multiprocessor.

If it is right, does foo<<<100, 32> mean that foo will be ececuted 100 times by a warp?
If it is right, it means that foo should be executed by only a multiprocessor even if my graphic card has 4 multiprocessors.
If my card has 4 multiprocessors, does foo<<<100, 32>>> mean that foo will be executed 25 times by each multiprocessor?
If my card has 4 multiprocessors, does foo<<<100, 16>>> also mean that foo will be executed 25 times by each multiprocessor?
If my card has 4 multiprocessors, does foo<<<1, 90>>> mean that foo will be executed once by 3 multiprocessors?
In the 4th case, can 90 threads use the same shared memory? I know that the threads in a multiprocessor only can access the same shared emnory.

Please be generous for my poor English.

_Big_Mac · March 29, 2010, 2:15pm

A block can only be processed on a single MP, you cannot split a block over multiple MPs (so if you run less blocks than you have MPs you automatically underutilize the hardware).

A single MP can process many blocks, if resources permit. MP’s schedule granularity is the warp size, so an MP’s queue can look like:
warp 0 from block 1
warp 10 from block 4
warp 2 from block 0
warp 1 from block 1
(assume no particular ordering or scheduling)

It’s a one-to-many relationship.

Threads from different blocks cannot share memory even if they happen to be processed on the same MP. Using shared memory is restricted to threads running in the same block, not running on the same MP. Note that threads from a single block are guaranteed to end up on the same MP.

Topic		Replies	Views
threads per block / multi processor, contradiction ? CUDA Programming and Performance	5	1742	January 23, 2009
Whats a WARP for? CUDA Programming and Performance	8	6686	June 21, 2007
A question the parallelization CUDA Programming and Performance	5	2764	July 29, 2008
Single workgroup on multiple multiprocessors CUDA Programming and Performance	2	7613	April 27, 2011
Parallel thread processing in a warp CUDA Programming and Performance	5	3833	July 17, 2009
A question about the CUDA's thread parallelization CUDA Programming and Performance	12	63160	January 25, 2009
Is a block matched to a SM? CUDA Programming and Performance	1	791	February 2, 2010
Streaming multriprocessors and processing blocks CUDA Programming and Performance	3	931	January 8, 2024
Blocks and Warps CUDA Programming and Performance	2	8119	January 7, 2009
Distribution of Threads to Multiprocessors CUDA Programming and Performance	8	13696	June 8, 2011

Relationship between Warp, MP, Block, Shared Memory

Related topics