Hi Everyone, I am confused with the following 2D matrix access using numba.cuda.
@cuda.jit
def matrix_add(a, b, out, coalesced):
# TODO: set x and y to index correctly such that each thread
# accesses one element in the data.
x, y = cuda.grid(2)
if coalesced == True:
out[y][x] = a[y][x] + b[y][x]
# TODO: write the sum of one element in `a` and `b` to `out`
# using a coalesced memory access pattern.
else:
out[x][y] = a[x][y] + b[x][y]
# TODO: write the sum of one element in `a` and `b` to `out`
# using an uncoalesced memory access pattern.
I understand for 1D coalesced memory access, the thread in a block by set threadIdx as column idx. But for 2D grid, the matrix using the second threadIdx to access row, why it is coalesced access?
Thanks
Richard