We have written a matrix addition program. We want to note the time taken to add 2 matrices.
The way we do it is :
matrix A [3016,3116]
matrix B [3016,3116]
matrix C [3016,3116]
here C will contain the final result…
Total Blocks we use is 3030.
So effectively each thread/block is reading A[id], b[id] ,adding them and writing c[id].
We notice that results are good for blocks 3030.
but wen we increase the number of blocks like 5050 it gives wierd results.
By wierd i mean : since the matrix size has incraeseed significantly the time for kernel should also incraese as this is a memory bound computation.
But we done see this. Its taking almost the same time.
Has anyone seen this behavior before.