Optimum Number of blocks

viral · April 21, 2008, 11:33pm

We have written a matrix addition program. We want to note the time taken to add 2 matrices.

The way we do it is :
matrix A [12816,256128]
matrix B [12816,256128]
matrix C [12816,256128]

here C will contain the final result…
Total Blocks we use is 128*16.
Threads/Block= 128

So effectively each thread/block is reading A[id], b[id] ,adding them and writing c[id]
in a loop 256 times.

Our Observations*
→ As we increase/decrease number of thread blocks, by keeping
the number of elements constant. i mean

matrix A [25616,128128]
matrix B [25616,128128]
matrix C [25616,128128]

here C will contain the final result…
Total Blocks we use is 256*16.
Threads/Block= 128

the times taken by the kernel change. Changes are dramatic in the sense its almost 1.5-1.7 times.

one naive thing could be system works better with more number of thread blocks. But is this mentioned anywhere?.

DOes anyone find/notice something like this?

chris22 · April 22, 2008, 12:31am

You get better performance when you avoid pipeline stalls. More blocks can do that in some instances.

viral · April 22, 2008, 12:33am

Can you pls elaborate your point. It will be nice if you can share something which you have observed.

jlehtone · April 22, 2008, 6:01am

See chapter 5.2 (page 62) of CUDA Programming Guide Version 1.1.

Topic		Replies	Views
Ideal number of thread per bloc CUDA Programming and Performance	9	3409	February 5, 2008
optimization questions CUDA Programming and Performance	2	781	February 25, 2012
finding the best number of threads per block CUDA Programming and Performance	3	7852	January 29, 2010
How to chose the number of blocks and threads in kernel calling CUDA Programming and Performance	3	665	November 27, 2011
Number of Threads CUDA Programming and Performance	0	3035	August 15, 2010
Number of thread blocks and threads in those, difference for performance? CUDA Programming and Performance	1	383	September 6, 2021
General Formula for Thread/Block Ratio CUDA Programming and Performance	1	593	June 2, 2011
Question on number of Blocks possible CUDA Programming and Performance	3	2119	April 22, 2008
Blocks and Threads CUDA Programming and Performance	1	642	February 7, 2013
Here are my timing results, not impressive. Help. CUDA Programming and Performance	5	7010	January 30, 2008