Determining Thread vs Block

Baral · November 11, 2009, 7:20pm

Hi,
I have written a kernel that requires to spawn 1024 threads. Each of these threads operate on separate data set and quite heavy in functionality. I have a Tesla C1060 card that is having 240 cores divided into 30 MP’s.
In this scenario, what would be the best(idea) way of invoking the kernel.
Is 2 block and 512 threads OR 32 blocks of 32 threads each?
I would appreaciate your response.

regards
Baral.

Tigga · November 11, 2009, 7:56pm

I don’t think you want either.

Firstly, you want at least enough blocks to fill your 30 MP’s, so that’s at least 30 blocks to start with. Then you want to look at your register & shared memory allocation - it may be that you can’t get 1024 threads on a MP. This is not a problem - you get pretty much optimial performance with only 512.

There are quite a few other things to take into account. In my application more threads per block are more efficient as they reduce global memory loads, however I have to sync my blocks quite regularily, so smaller ones are better to reduce waiting. I compromise at 128 threads per block.

Also - you can only have 8 blocks per MP. Can be an issue if you want 32 thread blocks. Given resources are allocated in lumps of 64, 32 thread blocks are wasteful in that respect too (as are 96 thread blocks and other such numbers).

I think the general method is to play around and see what works best.

Topic		Replies	Views
2 blocks versus 3 blocks CUDA Programming and Performance	5	4922	August 3, 2009
thread vs block CUDA Programming and Performance	1	1373	July 9, 2009
why atleast a block size of atleast 64 threads? CUDA Programming and Performance	10	2021	September 28, 2009
Ideal number of thread per bloc CUDA Programming and Performance	9	3426	February 5, 2008
kernel performance and number of threads CUDA Programming and Performance	2	6599	November 22, 2007
ideal number of tread per block CUDA Programming and Performance	10	2966	March 25, 2010
efficiency of block/thread ratios CUDA Programming and Performance	2	3823	April 18, 2007
Number of thread blocks and threads in those, difference for performance? CUDA Programming and Performance	1	385	September 6, 2021
How to use "block" and "thread" CUDA Programming and Performance	5	1261	October 16, 2013
Here are my timing results, not impressive. Help. CUDA Programming and Performance	5	7019	January 30, 2008

Determining Thread vs Block

Related topics