why shared vs. global mem speedup degrades?

yk_cadcg · August 19, 2008, 3:16am

hi, this is a general question. shared memory sure has speedups over global memory, assume these two memories are alternately used and timed, what are the possible explanations that sm vs. gm speedup decreases, when the space used is aggressively long?
thanks for quick replies!

VrahoK · August 19, 2008, 7:26am

Smem should have a higher bandwidth than the global mem (depending on how wide your gmem connection is). In addition to that, gmem shows a lot of latency (200-300 core cycles, if I remember correctly, is written somewhere in the CUDA programmers guide). If you have only a small number of data to pick up from global memory you see this latency and it influences your timings.

If you do have a lot of data to pick up (optimally coalesced) the board can hide those latencies by doing some calculations or send the next data fetch to the memory. So you will still have those 200-300 cycles latency somewhere (because you have to wait for the first data fetch to complete) but this latency stays almost constant over the amount of data you want to use because after the first transfer is done you can get a chunk of data every core cycle.

Topic		Replies	Views
How much faster is shared memory vs global memory? has anyone run some tests? CUDA Programming and Performance	4	8943	December 11, 2007
Device memory VS Shared memory CUDA Programming and Performance	4	4087	September 22, 2008
Why shared memory has lower bandwidth/multiprocessor than global memory? CUDA Programming and Performance	2	1125	December 6, 2009
comparision: shared mem <=> global mem actually no difference CUDA Programming and Performance	6	7551	July 21, 2008
Why the transpose speed is much quicker using shared memory inside CUDA Programming and Performance	3	5926	July 17, 2008
Correct Use of Shared Memory? CUDA Programming and Performance	1	712	January 6, 2010
Why Reg->shared->global is faster than Reg->global? CUDA Programming and Performance	7	1223	June 11, 2022
shared memory latency CUDA Programming and Performance	7	5881	May 18, 2011
Reduction: shared VS global memory CUDA Programming and Performance	4	7715	June 1, 2008
Global vs Shared Memory Grayscale performance same for both the codes. CUDA Programming and Performance	1	2449	June 4, 2011

why shared vs. global mem speedup degrades?

Related topics