I typed these PTX codes to the NVCC compiler:
ld.global.L2::128B xx
wmma.mma.sync xx
…
ld.global.L2::128B xx
wmma.mma.sync xx
…
ld.global.L2::128B xx
wmma.mma.sync xx
Here are the final SASS codes I got:
LDG.E.LTC128B.128.SYS R20, [R20]
LDG.E.LTC128B.128.SYS R16, [R16]
LDG.E.LTC128B.128.SYS R24, [R24]
…
HMMA.1688.F16 R40, R38, R48, R40
LDSM.16.M88.2 R30, [R65+0x30]
HMMA.1688.F16 R44, R38, R46, R44
LDSM.16.M88.2 R38, [R65+0x40]
HMMA.1688.F16 R48, R42, R49, R40
LDSM.16.M88.2 R40, [R66+0x40]
HMMA.1688.F16 R44, R42, R47, R44
…
But I wish the final SASS codes should be like these:
LDG.E.LTC128B.128.SYS R20, [R20]
HMMA.1688.F16 R40, R38, R48, R40
LDSM.16.M88.2 R30, [R65+0x30]
HMMA.1688.F16 R44, R38, R46, R44
LDSM.16.M88.2 R38, [R65+0x40]
HMMA.1688.F16 R48, R42, R49, R40
…
LDG.E.LTC128B.128.SYS R20, [R20]
HMMA.1688.F16 R40, R38, R48, R40
LDSM.16.M88.2 R30, [R65+0x30]
HMMA.1688.F16 R44, R38, R46, R44
LDSM.16.M88.2 R38, [R65+0x40]
HMMA.1688.F16 R48, R42, R49, R40