threads per block / multi processor, contradiction ?

rocksportrocker · January 22, 2009, 3:45pm

Hi,

the docs say:

[list=1]

[*]each block can consist of 512 threads max

[*]each block is executed on one multiprocessor,

[*]one mp can manage 768 threads.

is this a contradiction, or did I miss something ???

Greetings, Uwe

jack · January 22, 2009, 3:59pm

That sounds right to me…remember that the blocks are not executed in parallel, they are divided into warps of 32 threads, which are then executed on the MP. Each MP processes 8 threads at a time, but via the latency of instructions and pipelining, it is actually processing 32 threads (the warp) at once (each group of 8 is in one stage of the pipeline at any given time). So once the blocks are divided into warps, the warps are executed serially, which accounts for the apparent contradiction.

rocksportrocker · January 22, 2009, 4:21pm

But would it not be enough if each MP could execute 512 threads as there are never more threads scheduled to a MP ?

Tigga · January 22, 2009, 4:51pm

You can have more than one block per MP.

jack · January 22, 2009, 5:06pm

Plus, remember that your kernel is launched in a grid of blocks, so (in general) you’ll have way more blocks than you have MP’s…so the internal CUDA scheduler just keeps the blocks in a “queue” of sorts, and keeps the MP’s busy with them until all the blocks have completed.

EDIT: Also, this is why most people typically limit their blocks to a maximum of 256 threads…so that you can run three blocks per MP – and also for the convenience factor of having 256 threads being equivalent to a 16x16 block of threads, which is nice for 2D work (e.g. blocked matrix multiplication).

rocksportrocker · January 23, 2009, 10:27am

Thanks, that is what I misunderstood.

Greetings, Uwe

Topic		Replies	Views
A question the parallelization CUDA Programming and Performance	5	2694	July 29, 2008
finding the best number of threads per block CUDA Programming and Performance	3	7836	January 29, 2010
Architecture Questions CUDA Programming and Performance	6	8160	February 12, 2008
Threads vs Blocks How does one achieve maximum parallelism? CUDA Programming and Performance	1	1020	April 2, 2010
number of threads and registers CUDA Programming and Performance	10	4864	March 14, 2008
Distribution of Threads to Multiprocessors CUDA Programming and Performance	8	13608	June 8, 2011
Mapping of Blocks to MPs / Threads to MPs CUDA Programming and Performance	1	601	November 19, 2013
How they work betweem SM and block SM, SP, Block, Thread and so on. CUDA Programming and Performance	1	4316	January 8, 2008
Max no. of threads in a multiprocessor. CUDA Programming and Performance	4	1693	September 29, 2009
Physical Limit of Active Thread Number per Multiprocessor CUDA Programming and Performance	2	2166	December 14, 2008

threads per block / multi processor, contradiction ?

Related topics