Loading global memory values into shared memory

mjmawson · April 19, 2013, 8:51am

Is a load from global memory to shared memory decomposed into a load from global memory to a temporary register and a write from that register to shared memory? Or does data move from global memory directly to shared memory without ever “reaching” a processing core. My guess would be the first seeing how you have direct control of shared memory, but I want to rule out any architectural trickery that could get around this. Thanks.

tera · April 19, 2013, 11:49am

Yes, loads from global to shared memory are always performed through registers.
You can check that by yourself by looking at the output of cuobjdump -sass for an example program.

mjmawson · April 19, 2013, 12:32pm

I thought as much, profiling a sample kernel told me I was increasing the register count by using shared memory, and now that you mentioned sass I took a look at the sass code in the nsight profiler to confirm it. Thanks.

Topic		Replies	Views
Global memory to shared memory without passing registers CUDA-GDB	1	535	February 3, 2021
memory confusion how big is local/shared/global memory? CUDA Programming and Performance	6	3488	January 20, 2009
Cache behavior when loading global data to shared memory in Fermi CUDA Programming and Performance	1	1047	April 30, 2013
How many registers used in the code CUDA Programming and Performance	5	4174	December 24, 2008
problems of local memory and shared memory CUDA Programming and Performance	1	3418	July 10, 2011
Worth loading all to shared memory? CUDA Programming and Performance	2	2644	February 25, 2008
shared memory Computation become slower when using the shared memory CUDA Programming and Performance	8	1885	August 20, 2010
How to verify that the shared is used as declared? CUDA Programming and Performance	2	1004	March 19, 2009
memcpy equivalent for global memory to shared memo CUDA Programming and Performance	5	9299	November 12, 2007
Shared mem vs. registers CUDA Programming and Performance	3	1397	October 14, 2009

Loading global memory values into shared memory

Related topics