This question is about the SASS generated by PTXAS from PTX.
I’m seeing multiple STG.E instructions emitted back-to-back, which looks suspicious to me. I would expect only the final STG.E to remain, since the earlier ones should effectively be overwritten and have no visible effect.
The code computes y = Ax, where A is a matrix and x, y are vectors. One notable detail is that A is allocated in texture memory. I can reproduce the same behavior with surface memory as well.
My current guess is that this might be related to aliasing semantics for texture/surface memory.
Just bumping this—still trying to understand whether this behavior is expected or if I’m missing something.
From my understanding, these back-to-back STG.E instructions target the same address, so earlier stores should be overwritten and removable. However, PTXAS consistently emits all of them.
A couple of clarifications since the original post:
The behavior is reproducible across both texture and surface memory.
I don’t see the same pattern with plain global memory (which makes me suspect memory space semantics may be relevant).
My main concern is whether this indicates a missed optimization or if there’s a correctness constraint I’m overlooking (e.g., aliasing rules, memory ordering, or side effects specific to these memory types).
So my concrete questions:
Is PTXAS intentionally preserving these stores for texture/surface memory?
Are there documented aliasing or visibility rules that would prevent store elimination here?
If anyone has insight into how PTXAS treats these cases, I’d really appreciate it!