Do the tile dimensions and block dimensions have to be same for shared memory matrix multiplication ?
it is generally easier (i did not necessarily say better) to have the tile dimensions == block dimensions, i would think
running with your thought, if the tile dimension != block dimension, then block dimension < tile dimension; i can not perceive the case of tile dimension < block dimension
if the block dimension < tile dimension, you would then have to steer the block over the tile, likely through iteration
why do you ask this?
I am having hard time understanding shared memory (tiled) matrix multiplication
there is likely a matrix transpose buried in there
thus, if you first understand how to do a matrix transpose, and why it is done the way it is done, the matrix multiplication should be easier to follow, i would think
There’s a writeup in the programming guide that covers matrix multiplication using shared memory that may be of interest: