Mapping of Blocks to MPs / Threads to MPs

an_schall · November 19, 2013, 10:01am

Hi everybody,

I am running my CUDA code on a GeForce 210 (GeForce-Grafikkarten – das ultimative PC-Gaming) and I was wondering about the mapping of blocks to multi processors or better threads to multi processors:

1.) Running deviceQuery revealed that the maximum number of blocks per grid (one-dimensional grid) is 65535. So if I start a kernel with the maximum number of blocks ( kernel <<<65535, X>>> (param1, param2, …, paramN); ) how are they mapped to the MPs? I read that there might be 8 blocks at max concurrently being processed by 1 MP.

2.) If I start a kernel with the maximum number of blocks and a blocksize of 1 (kernel <<<65535, 1>>> (param1, param2, …, paramN); ) does it mean, that internally, a block will not care about the block size of 1 and run 32 threads anyway and just disregard the calculations of the other 31 threads?

I am pretty new to CUDA so sorry if the answers to my questions seem obvious.

pasoleatis · November 19, 2013, 1:12pm

Each MP can run more than 1 block , but totally they have a max number of threads which can be active. For cc 2.0 is 1536 while for 3.x is 2048. so for the cc 2.0 one would get higher occupancy in som ecases by using the blocksize 512 ( 3 blocks active) as opposed to a block size of 1024 and only 1 block active per MP.

For my codes I just changed the block size anc chosed the one which was faster.
2) Block with 1 thread will still behave the same as a blocks with 32. The thread object is only a representation of what happens at hardware level. It is like an assembly line which has only 1 object on it.

Topic		Replies	Views
threads per block / multi processor, contradiction ? CUDA Programming and Performance	5	1656	January 23, 2009
CUDA software and hardware mapping CUDA Programming and Performance	5	14676	February 21, 2009
Mapping between CUDA cores and threads CUDA Programming and Performance	7	15320	December 2, 2011
Architecture Questions CUDA Programming and Performance	6	8160	February 12, 2008
Per Block/Multiprocessor CUDA Programming and Performance	2	10247	September 1, 2011
Is a block matched to a SM? CUDA Programming and Performance	1	742	February 2, 2010
Threads vs Blocks How does one achieve maximum parallelism? CUDA Programming and Performance	1	1020	April 2, 2010
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27472	February 15, 2010
Maximum number of blocks Legacy PGI Compilers	5	2382	April 7, 2020
How determine max number of blocks and threads for a GPU? CUDA Programming and Performance	4	20433	December 13, 2018

Mapping of Blocks to MPs / Threads to MPs

Related topics