Hello, I’m studying the mma
instruction in PTX. And I found the code below which runs correctly. However I can not understand why.
asm volatile(
"mma.sync.aligned.m16n8k8.row.col.f32.tf32.tf32.f32 "
"{%0,%1,%2,%3}, {%4,%5,%6,%7}, {%8,%9}, {%10,%11,%12,%13};\n"
: "=f"(C[cd[0]]), // D[0] 32bit
"=f"(C[cd[1]]), // D[1]
"=f"(C[cd[2]]), // D[2]
"=f"(C[cd[3]]) // D[3]
: "r"(*reinterpret_cast<uint32_t const *>(&A[a[0]])), // A[0] 32bit
"r"(*reinterpret_cast<uint32_t const *>(&A[a[1]])), // A[1]
"r"(*reinterpret_cast<uint32_t const *>(&A[a[2]])), // A[2]
"r"(*reinterpret_cast<uint32_t const *>(&A[a[3]])), // A[3]
"r"(*reinterpret_cast<uint32_t const *>(&B[b[0]])), // B[0]
"r"(*reinterpret_cast<uint32_t const *>(&B[b[1]])), // B[1]
"f"(C[cd[0]]), // C[0]
"f"(C[cd[1]]), // C[1]
"f"(C[cd[2]]), // C[2]
"f"(C[cd[3]]) // C[3]
);
// a b cd store the current index of Matrix A B and CD
-
It ueses
mma.sync.aligned.m16n8k8.row.col.f32.tf32.tf32.f32
instruction. I understand that it should use"f"
in PTX code due tof32
. However why A and B is"r"
in PTX code? Shouldn’t it be of type.b32
due totf32
? PTX document says that " A register variable containingtf32
data must be declared with.b32
type.". -
how to understand
*reinterpret_cast<uint32_t const *>(&A[a[0]])
? I do not understand why it use&
andreinterpret_cast
? -
In PTX document, I don’t find the syntax
"f" "r"
. Where can I get the syntax reference?