wgmma.mma_async.sync.aligned.shape.dtype.bf16.bf16 d, a, b-desc, scale-d, imm-scale-a, imme-scale-b, imm-trans-b; I’m not familiar with Hopper’s Tensor Core programming. There is a huge difference between Hopper’s wgmma instructions and Ampere’s MMA. What is the meaning of imm-scale-a and imm-trans-a ?
The meaning of those operands is explained in the PTX manual. 1. Introduction — parallel-thread-execution 8.2 documentation
Matrices A and B are stored in row-major and column-major format respectively. For certain floating point variants, the input matrices A and B can be transposed by specifying the value 1 for the immediate integer arguments
imm-trans-a
andimm-trans-b
respectively. A value of 0 can be used to avoid the transpose operation. The valid values ofimm-trans-a
andimm-trans-b
are 0 and 1. The transpose operation is only supported for thewgmma.mma_async
variants with.f16
/.bf16
types on matrices accessed from shared memory using matrix descriptors.
For the floating point variants of the
wgmma.mma_async
operation, each element of the input matrices A and B can be negated by specifying the value -1 for operandsimm-scale-a
andimm-scale-b
respectively. A value of 1 can be used to avoid the negate operation. The valid values ofimm-scale-a
andimm-scale-b
are -1 and 1.
Thank you so much! I’ll check the latest PTX ISA documentation carefully.