In isolation or in context? In general, this is difficult to do in the context of actual kernel code compiled with optimizations due to the fairly extensive transformations the ptxas
optimizing compiler applies to code when it translates PTX to SASS (machine code).
A specific operation performed by a PTX instruction may disappear (e.g. through strength reduction) or may be combined with another instruction into a single SASS instruction. If the PTX instruction maps to an emulation sequence, the constituent SASS instructions of that emulation sequence could be spread out in the machine code, and they could be partially modified or eliminated (e.g. through constant propagation).