From the NVIDIA Kepler Architecture Whitepaper:
Kepler implements Shuffle instructions allowing threads within a warp to share data. Previously, sharing data between threads within a warp required separate store and load operations to pass the data through the shared memory. With Shuffle instructions, threads within a warp can read values from other threads in the warp.
What does it happen, under the hood, when invoking Shuffle instructions? By which kind of memory do the thread exchange data?
Thank you very much in advance.