What's the difference between L1 cache and the shared memory

bit_mapper · October 26, 2011, 9:30pm

In Fermi, it seems L1/shared memory are associated together and flexibly controllable.i.e. given 64K size, programmer can specify either 16K as shared memory with the rest 48K as L1, or choose 48K as shared memory with the rest 16K as L1.

My question is:

What the difference between L1 and Shared memory? Sounds like we can manage shared memory but not the L1 cache, but it makes no difference if we just use the whole 64K as cache rather than shared memory. Is it true?
Can we just saturate the whole 64K for L1 or just for shared memory?

seibert · October 27, 2011, 12:54am

The main difference between shared memory and the L1 is that the contents of shared memory are managed by your code explicitly, whereas the L1 cache is automatically managed. Shared memory is also a better way to exchange data between threads in a block with predictable timing. My rule of thumb is: unpredictable reads and writes => prefer L1.

There is no setting to configure all 64 kB for shared memory or L1 cache.

bit_mapper · October 27, 2011, 7:12pm

Thanks for the reply! Seibert

So you mean if the reads and writes can be specified regularly, we can use shared memory. If not, we can use L1. But actually there is no performance difference between the two cases because they are almost identical. Is it correct? especially when the kernel is doing random access and writes.

Is “16K L1+48K shared” and “48K L1+ 16K shared” the only two ways to divide this on-chip memory chunk? no other configuration allowed?

benetion · October 28, 2011, 6:18pm

I am also wondering why it is not allowed to configure all 64 kB for shared memory or L1 cache…
Some applications may prefer that.

seibert · October 29, 2011, 12:37pm

There might be some additional latency when using L1 due to the need to translate the global address you are accessing to a cache location, however I haven’t run any microbenchmarks to check that.

As to why the 64 kB can’t be assigned to all shared memory or all L1, I’m also not sure. I was genuinely surprised that NVIDIA gave us the option to configure the L1/shared memory ratio at all, given that every feature adds complexity and engineering time.

Topic		Replies	Views
How to optimize for cache + shared memory on Fermi? CUDA Programming and Performance	8	3034	April 25, 2010
Shared memory of SM CUDA Programming and Performance	1	394	October 31, 2019
FERMI L1 Information Associativity, Access Pattern CUDA Programming and Performance	3	1330	November 15, 2011
Reg: Options for changing L1 cache size in OPENCL CUDA Programming and Performance	5	1966	July 2, 2012
For Turing, if shared memory is larger than 32KB but smaller than 64KB, can L1 cache use the remaining part? CUDA Programming and Performance	1	384	June 16, 2022
10 MB of shared memory CUDA Programming and Performance	10	3501	December 3, 2009
No performance inprovement shared mem x global mem CUDA Programming and Performance	5	1144	April 26, 2013
Fermi: Cache configuration default at compile time From shared to L1 CUDA Programming and Performance	4	19524	April 16, 2010
Shared memory alternative CUDA Programming and Performance	7	2428	December 7, 2011
L1 data cache/shared memory size in Volta architecture CUDA Programming and Performance	4	1667	February 13, 2020

What's the difference between L1 cache and the shared memory

Related topics