2 blocks versus 3 blocks


Suppose I have to run 768 thread on a multiprocessor. I have the following choices;

1- Assign two blocks… one with 512 therads and another with 256 threads

2-Assign three blocks each with 256 threads

Can you tell me which one will give me more performance.


Option 2, because option 1 is physically impossible. When you launch a grid, every block must have the same execution parameters. You cannot have different block configurations within the same kernel run.

Thanks Avidday

What if I have the following options:

1- 3 blocks with 50 threads per block

2-2 blocks with 75 threads per block

Which option is better as far as the speed is concerned?

Neither is really better. Whether you run 2 or 3 blocks, you are still only making use of a few % of the hardware’s capabilities and the launch overhead will likely dominate your kernel’s execution time.

Edit: To make it a little more clear, you can probably run 30 or 60 blocks in about the exact same time as it would take to run 1 of the same size due to the parallel nature of the hardware.

Thanks MisterAnderson42,

You mean I should use 30 blocks with 5 threads per block ?? Could you please elaborate it further?

I think they have missed the part where you said “per multiprocessor”.

So if that is indeed the case, you have a lot more than 768 threads total, to run on the graphics card.

There is no secret recipe to block sizes. You try them for your specific problem, and find what the sweet spot is. And that sweet spot wont (necessarily) be the same for another problem.

So no, 5 threads per block would be terrible, since you have 8 SPs in an MP running in parallel, and the size of a warp is 32 threads. What MrAnderson was trying to say is that if your WHOLE GPU has, say, 14 multiprocessors, then you need to run at the very least 14 blocks to keep the card occupied.

In your case, if that “768 per MP” figure is correct, you have to find what the sweet spot is.