We have written a matrix addition program. We want to note the time taken to add 2 matrices.

The way we do it is :

matrix A [30*16,31*16]

matrix B [30*16,31*16]

matrix C [30*16,31*16]

here C will contain the final result…

Total Blocks we use is 30*30.
Threads/Block= 16*16

So effectively each thread/block is reading A[id], b[id] ,adding them and writing c[id].

We notice that results are good for blocks 30*30.
but wen we increase the number of blocks like 50*50 it gives wierd results.

By wierd i mean : since the matrix size has incraeseed significantly the time for kernel should also incraese as this is a memory bound computation.

But we done see this. Its taking almost the same time.

Has anyone seen this behavior before.