Using grids and blocks for nested loops?

I’m having some difficulty figuring out how to use grids and blocks for computations, I’ve got several other answers to similar questions open but none of them quite seem to help, nor does the provided documentation.

Specifically, if I have a nested loop involving a 2d array or 2 1d arrays that I need to loop through, how would I create the grid and then get the thread indexs that I would need to do that.

I understand that you use the blockDim%y and blockIdx%y and threadIdx%y, but I’m not actually quite sure how to choose a proper grid and block size nor how to set it up within the subroutine itself. For an outer loop of length N and an inner loop of length M what would I do? I can give more specific example code if that would be useful.