Device memory VS Shared memory

kittyspecial · September 18, 2008, 1:04am

Which one is faster to access within the kernel?

Thanks a lot. :D

alex_dubinsky · September 18, 2008, 4:54am

“Device” memory is the same as “global” memory. It’s unfortunate that there’s this discrepancy in terminology. I’m sure you know global memory is much slower than shared.

kittyspecial · September 18, 2008, 4:43pm

I haven’t figure out how to fit the data into shared memory, so I use device memory as the input and out of the kernel. By acomparing the time between CUDA code and CPU code(same algorithm), the performance got enhanced 4 times. So I bring out the questions, what’s the performance difference between shared memory and global memory.

Thanks a lot for the answer.

E.D_Riedijk · September 18, 2008, 5:35pm

all input to a kernel goes through global memory. shared memory is filled inside a kernel, the acces to this shared memory is much faster (almost as fast as accessing a register). It is however only beneficial if threads have to work together or acess the same values from global memory to use shared memory.
You can find plenty of uses for shared memory in the SDK examples.

apaehler · September 22, 2008, 1:33pm

If the memory you need to access is read-only on the GPU side, you may also want to look at constant memory. This is global memory, more than shared memory (64 K vs 16 K), but cached, can be read and written from the host and read from the device. I used it very beneficially in one case, being able to stuff data into constant memory for a speedup increase from roughly 20x to 680x ! (vs CPU). Beyond that, global memory is VERY slow, but whether that really hurts performance, depends on your memory access patterns. The idea of having many threads running in parallel on the GPU is to hide memory latencies. In another example where my amount of “constant” data is so large that it barely fits in global memory, I can still achieve speedups up to 250x (vs CPU), because the latencies can be hidden well enough. CPU codes are optimized as much as possible, so the speedups are quite reasonable (GPU: 9800GT)

Topic		Replies	Views
access speed of shared memory and global memory CUDA Programming and Performance	1	1070	August 6, 2009
comparision: shared mem <=> global mem actually no difference CUDA Programming and Performance	6	7551	July 21, 2008
How much faster is shared memory vs global memory? has anyone run some tests? CUDA Programming and Performance	4	8943	December 11, 2007
Shared Memory Vs Device Memory Device memory gives better result :fear: CUDA Programming and Performance	3	2717	April 16, 2007
why shared vs. global mem speedup degrades? CUDA Programming and Performance	1	1603	August 19, 2008
About the different memories CUDA Programming and Performance	12	11528	December 6, 2007
Global memory vs device memory CUDA Programming and Performance	6	3201	March 26, 2023
Reduction: shared VS global memory CUDA Programming and Performance	4	7715	June 1, 2008
Question regarding transfer from global to shared memory CUDA Programming and Performance	5	5958	November 27, 2010
No performance inprovement shared mem x global mem CUDA Programming and Performance	5	1147	April 26, 2013

Device memory VS Shared memory

Related topics