Suppose I have an array A of size 256 ints stored in shared memory.
If I access in a warp entries from this array to do something like (back of the envelope)
sum[threadIdx.x] += A[threadIdx.x];
will the shared memory access to A[threadIdx.x] be staged in a register prior to doing the “+=”?
Another way to ask the same question: can I have instructions that access operands right from the shared memory? Or will this data be staged through a register no matter what?
Ultimately, I’m trying to understand whether data stored in shared memory can contribute when used to rising the register pressure.