Question on number of Blocks possible

viral · April 21, 2008, 1:53am

We have written a matrix addition program. We want to note the time taken to add 2 matrices.

The way we do it is :
matrix A [3016,3116]
matrix B [3016,3116]
matrix C [3016,3116]

here C will contain the final result…
Total Blocks we use is 3030.
Threads/Block= 1616

So effectively each thread/block is reading A[id], b[id] ,adding them and writing c[id].

We notice that results are good for blocks 3030.
but wen we increase the number of blocks like 5050 it gives wierd results.

By wierd i mean : since the matrix size has incraeseed significantly the time for kernel should also incraese as this is a memory bound computation.
But we done see this. Its taking almost the same time.

Has anyone seen this behavior before.

DenisR · April 21, 2008, 4:57am

5050 = 2500 blocks. This is not a lot for CUDA, so you are seeing the fact that kernel-call overhead is probably dominant. 900 blocks per 16 multiprocessors = 56 blocks per MP. With 3 blocks running per MP, that is only 19 blocks to process after eachother.
If you check the time difference between 300300 and 500*500, you should see more differences.

viral · April 21, 2008, 8:15pm

Thanks for ur response. I dont understand wen u say

“With 3 blocks running per MP, that is only 19 blocks to process after eachother.”…

how are you sure there are 3 blocks running per MP?.

DenisR · April 22, 2008, 7:00pm

I am not sure there are 3 blocks running per MP. It is the maximum for 16x16 threads. But given your algorithm my guess would be that you will have 3 blocks per MP running at the same time. You can find out by filling in your values in the occupancy calculator.

Topic		Replies	Views
Optimum Number of blocks CUDA Programming and Performance	3	2199	April 22, 2008
Do we need to be conscious of the number of MPs in our GPU? CUDA Programming and Performance	2	2442	June 4, 2012
Here are my timing results, not impressive. Help. CUDA Programming and Performance	5	7080	January 30, 2008
2 blocks versus 3 blocks CUDA Programming and Performance	5	4992	August 3, 2009
what conclusion can I get from this experinment? CUDA Programming and Performance	7	783	July 20, 2017
How to explain when increase blocks from 1 to n per MP, throughput suddenly drop at some point Same CUDA Programming and Performance	4	926	February 15, 2011
Quasi-Sequential Matrix Muliplication using Single Block CUDA Programming and Performance	0	3133	September 23, 2009
too many blocks slow down the performance block/thread config CUDA Programming and Performance	1	4430	August 4, 2011
CUDA Matrix Multiplication Issues threads and blocks problem CUDA Programming and Performance	2	3684	March 1, 2009
Help with block size and block numbers CUDA Programming and Performance	3	1841	November 26, 2009

Question on number of Blocks possible

Related topics