Clarification on the accumulator layout in an mma instruction

I’m looking at the documentation on MMA instructions from GTC 2020:

And I just want to confirm that in this example T0 (in the accumulator), is storing

  • 2 elements corresponding to (row 0 x column 1, row 0 x column 2)
  • 2 elements corresponding to (row 9 x column 1, row 9 x column 2)

Seems obvious, but worth double-checking this type of thing

I guess by accumulator you mean result. T0 means thread 0.

It looks to me like row 0 column 0 and 1, and row 8 column 0 and 1

I assume you are looking for 0-based indexing since you mention row 0.

You should be able to double-check it in the “actual” documentation, the PTX guide.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.