I can’t quite figure out how to use dim3 effectively for creating nested loops. I understand that you define threads per block and blocks per grid and you can use blockIdx%x/y/z and other equivalents for nested loops, but I’m afraid I don’t quite understand how.
I understand that the actual conversion is
loop x for N1
loop y for N2
array[(blockDim%x * (blockIdx%x - 1) + threadIdx%x),(blockDim%y * (blockIdx%y - 1) + threadIdx%y)]
but I don’t understand how one picks the actually size for dim3(x,y,z). My attempts to read into this haven’t helped very much. I can give a more specific example of code I’m trying to get working if that would help.