This question is about the SASS generated by PTXAS from PTX.
I’m seeing multiple STG.E instructions emitted back-to-back, which looks suspicious to me. I would expect only the final STG.E to remain, since the earlier ones should effectively be overwritten and have no visible effect.
STG.E [R2], R31
STG.E [R2], R29
STG.E [R2], R30
STG.E [R2], R28
STG.E [R2], R23
Here’s the full PTX that reproduces the issue:
https://godbolt.org/z/oW9ve9fPa
The code computes y = Ax, where A is a matrix and x, y are vectors. One notable detail is that A is allocated in texture memory. I can reproduce the same behavior with surface memory as well.
My current guess is that this might be related to aliasing semantics for texture/surface memory.