I’m trying to better understand the meaning of the canonical layout shape mentioned in the SM90 documentation for TMA (Tensor Memory Access) in the PTX doc. The documentation shows the shape as ((8, m), (T, 2k)), but I’m not sure how to interpret this.
Specifically:
- Why is the second dimension always written as
(T, 2k)regardless of the swizzling granularity? - From the figure in the doc, it seems like:
- For 128B swizzling, the shape should be something like
(8T, k) - For 64B swizzling, it seems more like
(4T, k) - For 32B swizzling, maybe
(2T, k)
- For 128B swizzling, the shape should be something like
Could someone help clarify how the (T, 2k) structure is derived and why it’s used as the canonical representation?
Thanks in advance!
