PTXAS emits redundant STG.E instructions for same address?

This question is about the SASS generated by PTXAS from PTX.

I’m seeing multiple STG.E instructions emitted back-to-back, which looks suspicious to me. I would expect only the final STG.E to remain, since the earlier ones should effectively be overwritten and have no visible effect.

 STG.E [R2], R31 
 STG.E [R2], R29 
 STG.E [R2], R30 
 STG.E [R2], R28 
 STG.E [R2], R23 

Here’s the full PTX that reproduces the issue:
https://godbolt.org/z/oW9ve9fPa

The code computes y = Ax, where A is a matrix and x, y are vectors. One notable detail is that A is allocated in texture memory. I can reproduce the same behavior with surface memory as well.

My current guess is that this might be related to aliasing semantics for texture/surface memory.

Just bumping this—still trying to understand whether this behavior is expected or if I’m missing something.

From my understanding, these back-to-back STG.E instructions target the same address, so earlier stores should be overwritten and removable. However, PTXAS consistently emits all of them.

A couple of clarifications since the original post:

  • The behavior is reproducible across both texture and surface memory.
  • I don’t see the same pattern with plain global memory (which makes me suspect memory space semantics may be relevant).
  • My main concern is whether this indicates a missed optimization or if there’s a correctness constraint I’m overlooking (e.g., aliasing rules, memory ordering, or side effects specific to these memory types).

So my concrete questions:

  1. Is PTXAS intentionally preserving these stores for texture/surface memory?
  2. Are there documented aliasing or visibility rules that would prevent store elimination here?

If anyone has insight into how PTXAS treats these cases, I’d really appreciate it!