PTXAS emits redundant STG.E instructions for same address?

Lai-YT · April 9, 2026, 4:33am

This question is about the SASS generated by PTXAS from PTX.

I’m seeing multiple STG.E instructions emitted back-to-back, which looks suspicious to me. I would expect only the final STG.E to remain, since the earlier ones should effectively be overwritten and have no visible effect.

 STG.E [R2], R31 
 STG.E [R2], R29 
 STG.E [R2], R30 
 STG.E [R2], R28 
 STG.E [R2], R23

Here’s the full PTX that reproduces the issue:
https://godbolt.org/z/oW9ve9fPa

The code computes y = Ax, where A is a matrix and x, y are vectors. One notable detail is that A is allocated in texture memory. I can reproduce the same behavior with surface memory as well.

My current guess is that this might be related to aliasing semantics for texture/surface memory.

Topic		Replies	Views
NVCC / CUDA BUG? PTX: st Run error when try to st a value generated with cvta. CUDA Programming and Performance	0	816	February 3, 2012
preventing ptxas from reordering instructions CUDA Programming and Performance	23	6508	December 2, 2022
unused st.local in PTX code CUDA Programming and Performance	2	2793	November 24, 2012
[Solved] Texture access and inline CUDA ptx assembly in VS 2010 CUDA Programming and Performance	3	1130	September 8, 2013
PTX instructions are reordered CUDA Programming and Performance	12	1711	May 13, 2024
Strange PTX Output CUDA Programming and Performance	9	3426	December 19, 2014
Is there asynchronous store (to global) instruction in PTX? CUDA Programming and Performance cuda	3	727	October 12, 2021
ptx optimization CUDA Programming and Performance	3	1212	May 30, 2009
CUDA low-level programming - strange ptxas behavior CUDA Programming and Performance	4	1551	February 17, 2014
Can't make ptxas generate efficient code CUDA Programming and Performance	23	4724	December 30, 2012

PTXAS emits redundant STG.E instructions for same address?

Related topics