Hi!
I’ve got a question regarding shared memory access. Let’s say that I have an instruction, which reads two input operands from two different locations in shared memory and stores the result in another location, also in shared memory.
I know that a shared memory request is split into two memory requests, one for each half-warp (for CC 1.x). What happens though when one instruction needs multiple accesses to shared memory, as mentioned above (e.g. such a situation appears when implementing reduction algorithms)? Will there be a different transaction for each input and output operand? My main concern is to find out how to think of the possible memory bank conflicts.
Hi!
I’ve got a question regarding shared memory access. Let’s say that I have an instruction, which reads two input operands from two different locations in shared memory and stores the result in another location, also in shared memory.
I know that a shared memory request is split into two memory requests, one for each half-warp (for CC 1.x). What happens though when one instruction needs multiple accesses to shared memory, as mentioned above (e.g. such a situation appears when implementing reduction algorithms)? Will there be a different transaction for each input and output operand? My main concern is to find out how to think of the possible memory bank conflicts.
Can you give an example? Are you thinking of something like smem += smem[y]? In this case there will be two reads and one write for each half-warp, and you have to worry about bank conflicts for each.
Can you give an example? Are you thinking of something like smem += smem[y]? In this case there will be two reads and one write for each half-warp, and you have to worry about bank conflicts for each.