Threads vs Blocks How does one achieve maximum parallelism?

Justin_Smith · April 2, 2010, 2:02pm

Is it true that threads in a block execute on a single processor — so that one should use a maximum number of blocks (and minimize the number of threads per block) to achieve maximum parallelism?

Or are these pure abstractions that bear no relation to what goes on at the hardware level?

avidday · April 2, 2010, 2:09pm

Yes - if by processor you mean multiprocessor, which is an SIMT unit of 8 scalar cores.

No. There are fixed overheads and latencies in 1.0/1.1/1.2/1.3 hardware which require a minimum of 192 active threads per multiprocessor to be amortized. Also the warp size on all hardware is 32 threads, so you should have at least 32 threads per block and at least 6 active warps per multiprocessor to get anything like peak performance. Obviously the number of blocks launched should be a round multiple of the number of MP on a given device.

Topic		Replies	Views
threads per block / multi processor, contradiction ? CUDA Programming and Performance	5	1656	January 23, 2009
finding the best number of threads per block CUDA Programming and Performance	3	7849	January 29, 2010
question about warp, block and threads CUDA Programming and Performance	4	2002	February 3, 2009
Synchronizing Blocks CUDA Programming and Performance	3	2447	January 10, 2018
Scheduling Thread Blocks CUDA Programming and Performance	5	1180	July 29, 2021
Distribution of Threads to Multiprocessors CUDA Programming and Performance	8	13610	June 8, 2011
Architecture Questions CUDA Programming and Performance	6	8171	February 12, 2008
Execution Of Thread-Blocks CUDA Programming and Performance	4	5282	June 18, 2007
Blocks with varying thread size? CUDA Programming and Performance	1	528	June 5, 2011
thread vs block CUDA Programming and Performance	1	1372	July 9, 2009

Threads vs Blocks How does one achieve maximum parallelism?

Related topics