wgmma.mma_async.sync.aligned.shape.dtype.bf16.bf16 d, a, b-desc, scale-d, imm-scale-a, imme-scale-b, imm-trans-b; I’m not familiar with Hopper’s Tensor Core programming. There is a huge difference between Hopper’s wgmma instructions and Ampere’s MMA. What is the meaning of imm-scale-a and imm-trans-a ?

The meaning of those operands is explained in the PTX manual. 1. Introduction — parallel-thread-execution 8.2 documentation

Matrices A and B are stored in row-major and column-major format respectively. For certain floating point variants, the input matrices A and B can be transposed by specifying the value 1 for the immediate integer arguments imm-trans-a and imm-trans-b respectively. A value of 0 can be used to avoid the transpose operation. The valid values of imm-trans-a and imm-trans-b are 0 and 1. The transpose operation is only supported for the wgmma.mma_async variants with .f16/ .bf16 types on matrices accessed from shared memory using matrix descriptors.

For the floating point variants of the wgmma.mma_async operation, each element of the input matrices A and B can be negated by specifying the value -1 for operands imm-scale-a and imm-scale-b respectively. A value of 1 can be used to avoid the negate operation. The valid values of imm-scale-a and imm-scale-b are -1 and 1.

Thank you so much! I’ll check the latest PTX ISA documentation carefully.