A Question from Programming Massively Parallel Processors: A Hands-on Approach

1.Consider matrix addition. Can one use shared memory to reduce the global
memory bandwidth consumption? Hint: Analyze the elements accessed by
each thread and see if there is any commonality between threads.

2.Also, are my answers correct for the following question?
Consider performing a matrix multiplication of two input matrices with
dimensions N × N. How many times is each element in the input matrices
requested from global memory in the following situations?
A. There is no tiling : N
B. Tiles of size T × T are used: N/T