Could you explain how TMA works? For example, when we write from the shared memory Tensor sS
to the global memory Tensor gD
, it seems like the data is written sequentially, i.e., sS[i]
directly maps to gD[i]
. Is this correct?
Could you explain how TMA works? For example, when we write from the shared memory Tensor sS
to the global memory Tensor gD
, it seems like the data is written sequentially, i.e., sS[i]
directly maps to gD[i]
. Is this correct?