I have to do experiments using NxN (where N=1000) square matrices.
I am told to create a CUDA kernel of configuration
SIZE_1 x SIZE_2.
Does that mean
dim3 block_size(SIZE_1, SIZE_1); dim3 grid_size(SIZE_2, SIZE_2);
or, does that mean
dim3 block_size(SIZE_1, SIZE_2); dim3 grid_size((N + SIZE_1 - 1) / SIZE_1, (N + SIZE_2 - 1) / SIZE_2);
or, something else
Please, explain why.
I need to do the following experiment: