I have some kernels that perform loads/stores using a signed compile-time constant stride.
Inspection of SASS shows that the immediate appears to be capped at ~25 bits (sign + 6 nybbles)…
I ask because PTXAS starts gobbling registers when it transitions from [reg+immedate] to basic [reg] addressing.
Out of curiosity, can anyone confirm that the number of bits in the immediate is capped to less than 32 bits?
This is on sm_52.