wgmma.mma_async.sync.aligned.shape.dtype.bf16.bf16 d, a, b-desc, scale-d, imm-scale-a, imme-scale-b, imm-trans-b; I’m not familiar with Hopper’s Tensor Core programming. There is a huge difference between Hopper’s wgmma instructions and Ampere’s MMA. What is the meaning of imm-scale-a and imm-trans-a ?
The meaning of those operands is explained in the PTX manual. 1. Introduction — parallel-thread-execution 8.2 documentation
Matrices A and B are stored in row-major and column-major format respectively. For certain floating point variants, the input matrices A and B can be transposed by specifying the value 1 for the immediate integer arguments
imm-trans-brespectively. A value of 0 can be used to avoid the transpose operation. The valid values of
imm-trans-bare 0 and 1. The transpose operation is only supported for the
.bf16types on matrices accessed from shared memory using matrix descriptors.
For the floating point variants of the
wgmma.mma_asyncoperation, each element of the input matrices A and B can be negated by specifying the value -1 for operands
imm-scale-brespectively. A value of 1 can be used to avoid the negate operation. The valid values of
imm-scale-bare -1 and 1.
Thank you so much！ I’ll check the latest PTX ISA documentation carefully.