I have to do experiments using NxN (where N=1000) square matrices.
I am told to create a CUDA kernel of configuration SIZE_1 x SIZE_2
.
Does that mean
dim3 block_size(SIZE_1, SIZE_1);
dim3 grid_size(SIZE_2, SIZE_2);
or, does that mean
dim3 block_size(SIZE_1, SIZE_2);
dim3 grid_size((N + SIZE_1 - 1) / SIZE_1, (N + SIZE_2 - 1) / SIZE_2);
or, something else
??
Please, explain why.
I need to do the following experiment: