Hi Everyone, I am confused with the following 2D matrix access using numba.cuda.
@cuda.jit def matrix_add(a, b, out, coalesced): # TODO: set x and y to index correctly such that each thread # accesses one element in the data. x, y = cuda.grid(2) if coalesced == True: out[y][x] = a[y][x] + b[y][x] # TODO: write the sum of one element in `a` and `b` to `out` # using a coalesced memory access pattern. else: out[x][y] = a[x][y] + b[x][y] # TODO: write the sum of one element in `a` and `b` to `out` # using an uncoalesced memory access pattern.
I understand for 1D coalesced memory access, the thread in a block by set threadIdx as column idx. But for 2D grid, the matrix using the second threadIdx to access row, why it is coalesced access?