2 blocks versus 3 blocks

Kiran_CUDA · August 2, 2009, 12:47pm

Hi,

Suppose I have to run 768 thread on a multiprocessor. I have the following choices;

1- Assign two blocks… one with 512 therads and another with 256 threads

2-Assign three blocks each with 256 threads

Can you tell me which one will give me more performance.

Thanks

avidday · August 2, 2009, 12:56pm

Option 2, because option 1 is physically impossible. When you launch a grid, every block must have the same execution parameters. You cannot have different block configurations within the same kernel run.

Kiran_CUDA · August 3, 2009, 10:00am

Thanks Avidday

What if I have the following options:

1- 3 blocks with 50 threads per block

2-2 blocks with 75 threads per block

Which option is better as far as the speed is concerned?

MisterAnderson42 · August 3, 2009, 11:20am

Neither is really better. Whether you run 2 or 3 blocks, you are still only making use of a few % of the hardware’s capabilities and the launch overhead will likely dominate your kernel’s execution time.

Edit: To make it a little more clear, you can probably run 30 or 60 blocks in about the exact same time as it would take to run 1 of the same size due to the parallel nature of the hardware.

Kiran_CUDA · August 3, 2009, 12:05pm

Thanks MisterAnderson42,

You mean I should use 30 blocks with 5 threads per block ?? Could you please elaborate it further?

Ailleur · August 3, 2009, 12:29pm

I think they have missed the part where you said “per multiprocessor”.

So if that is indeed the case, you have a lot more than 768 threads total, to run on the graphics card.

There is no secret recipe to block sizes. You try them for your specific problem, and find what the sweet spot is. And that sweet spot wont (necessarily) be the same for another problem.

So no, 5 threads per block would be terrible, since you have 8 SPs in an MP running in parallel, and the size of a warp is 32 threads. What MrAnderson was trying to say is that if your WHOLE GPU has, say, 14 multiprocessors, then you need to run at the very least 14 blocks to keep the card occupied.

In your case, if that “768 per MP” figure is correct, you have to find what the sweet spot is.

Topic		Replies	Views
efficiency of block/thread ratios CUDA Programming and Performance	2	3888	April 18, 2007
newbie, microprocessors CUDA Programming and Performance	7	4852	March 26, 2008
thread vs block CUDA Programming and Performance	1	1422	July 9, 2009
How better split threads between block/grid ? CUDA Programming and Performance	4	3548	May 7, 2009
Performance in different thread-block schemes CUDA Programming and Performance	5	2447	September 19, 2008
threads per block / multi processor, contradiction ? CUDA Programming and Performance	5	1765	January 23, 2009
Do we need to be conscious of the number of MPs in our GPU? CUDA Programming and Performance	2	2449	June 4, 2012
Number of thread blocks and threads in those, difference for performance? CUDA Programming and Performance	1	431	September 6, 2021
Distribution of Threads to Multiprocessors CUDA Programming and Performance	8	13753	June 8, 2011
Which entity will execute one block? A single Cuda core or a SM? CUDA Programming and Performance	13	17317	December 7, 2010

2 blocks versus 3 blocks

Related topics