Suppose, I need ROWS*COLS
number of threads for tiled matrix multiplication.
The threads-per-block is given as -
dim3 threads_per_blocks(
BLOCK_1, BLOCK_2, BLOCK_3);
and the tile dimensions are given as TILE_1*TILE_2
.
Give me the formula for calculating the
dim3 blocks_per_grid(?, ?, ?);
?