I’ve got a question regarding shared memory access. Let’s say that I have an instruction, which reads two input operands from two different locations in shared memory and stores the result in another location, also in shared memory.
I know that a shared memory request is split into two memory requests, one for each half-warp (for CC 1.x). What happens though when one instruction needs multiple accesses to shared memory, as mentioned above (e.g. such a situation appears when implementing reduction algorithms)? Will there be a different transaction for each input and output operand? My main concern is to find out how to think of the possible memory bank conflicts.
Thanks a lot.