does data in shared memory add to the register pressure

Suppose I have an array A of size 256 ints stored in shared memory.
If I access in a warp entries from this array to do something like (back of the envelope)

sum[threadIdx.x] += A[threadIdx.x];

will the shared memory access to A[threadIdx.x] be staged in a register prior to doing the “+=”?

Another way to ask the same question: can I have instructions that access operands right from the shared memory? Or will this data be staged through a register no matter what?

Ultimately, I’m trying to understand whether data stored in shared memory can contribute when used to rising the register pressure.




can be trivially confirmed with cuda binary utilities:

thanks so much, Robert!