I know global memory slower than shared memory, slower than register. but where is warp shuffle? faster than register?
Warp shuffle is the same performance as shared memory. Warp shuffle cannot be faster than a register as the input is a read from the register file and the output is a write to the register file.
2 Likes
Warp shuffle is the same performance as shared memory.
Typically better. That is, AFAICR it’s faster than a store to shared memory followed by a store from shared memory.
But yes, using registers is best, if one can manage to keep things in registers
1 Like
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.