Coalesced access for 2D Matrix

Hi Everyone, I am confused with the following 2D matrix access using numba.cuda.

@cuda.jit
def matrix_add(a, b, out, coalesced):
    # TODO: set x and y to index correctly such that each thread
    # accesses one element in the data.
    x, y = cuda.grid(2)
    
    if coalesced == True:
        out[y][x] = a[y][x] + b[y][x]
        # TODO: write the sum of one element in `a` and `b` to `out`
        # using a coalesced memory access pattern.
    else:
        out[x][y] = a[x][y] + b[x][y]
        # TODO: write the sum of one element in `a` and `b` to `out`
        # using an uncoalesced memory access pattern.

I understand for 1D coalesced memory access, the thread in a block by set threadIdx as column idx. But for 2D grid, the matrix using the second threadIdx to access row, why it is coalesced access?

Thanks
Richard

1 Like