Fermi: Cache configuration default at compile time From shared to L1

Tigga · April 16, 2010, 2:19pm

My application has many kernels, but none will benefit from over 16k of shared memory. For this reason I want the 48k to be assigned to the L1 cache (which will benefit me). Is there any way to do set this as the default at compile time to save me cluttering up my code with hundreds (literally) of configuration calls? I had a look around but couldn’t see to find anything.

Also; there is mention of the ability to disable the L1 cache for global memory. Why would one want to do this? Maybe if you have many kernels with massive local memory spilling? Interestingly this option seems to be only available at compile time.

seibert · April 16, 2010, 3:25pm

If your kernel only makes single-use coalesced reads, then the L1 cache will provide no benefit. Those reads might push local memory out to the L2 or global memory, as you mention. I also wonder if bypassing the L1 improves latency on global memory reads. This isn’t mentioned anywhere, but that seems like another good reason to be able to turn the cache off.

I assume the compile time option is a temporary hack to globally control the PTX generation. What we really need is some way inside the kernel to specify individual memory reads as being uncached so the compiler can generate PTX load instructions with the appropriate cache modifiers.

seibert · April 16, 2010, 3:25pm

If your kernel only makes single-use coalesced reads, then the L1 cache will provide no benefit. Those reads might push local memory out to the L2 or global memory, as you mention. I also wonder if bypassing the L1 improves latency on global memory reads. This isn’t mentioned anywhere, but that seems like another good reason to be able to turn the cache off.

I assume the compile time option is a temporary hack to globally control the PTX generation. What we really need is some way inside the kernel to specify individual memory reads as being uncached so the compiler can generate PTX load instructions with the appropriate cache modifiers.

tmurray · April 16, 2010, 7:04pm

If you want every kernel in your app to use the same caching mode, set it for the first kernel you launch. The default mode is 48k shared/16k L1, but the default setting for individual kernels is “don’t care.” If one kernel changes the caching mode and you make no other calls to set the caching mode, it will be 16k shared/48k L1 unless you try to launch a kernel that cannot launch with 16k shmem.

tmurray · April 16, 2010, 7:04pm

If you want every kernel in your app to use the same caching mode, set it for the first kernel you launch. The default mode is 48k shared/16k L1, but the default setting for individual kernels is “don’t care.” If one kernel changes the caching mode and you make no other calls to set the caching mode, it will be 16k shared/48k L1 unless you try to launch a kernel that cannot launch with 16k shmem.

Topic		Replies	Views
More Shared Memory by disabling L1 Cache? CUDA Programming and Performance	3	1281	February 24, 2013
How can I check and see if my GPU is using L1 cache CUDA Programming and Performance	7	3001	June 9, 2011
How to optimize for cache + shared memory on Fermi? CUDA Programming and Performance	8	3070	April 25, 2010
L1 Cache, L2 Cache and Shared memory in Fermi CUDA Programming and Performance	5	23592	March 21, 2011
Fermi L1 Cache coherent? CUDA Programming and Performance	5	14932	May 20, 2010
cannot disable L1 on Fermi CUDA Programming and Performance	0	3722	June 8, 2011
Bypassing cache in Fermi CUDA Programming and Performance	16	4818	August 28, 2010
What's the difference between L1 cache and the shared memory CUDA Programming and Performance	4	15113	October 29, 2011
global memory caching CUDA Programming and Performance	4	1433	March 13, 2012
Cache behavior when loading global data to shared memory in Fermi CUDA Programming and Performance	1	1025	April 30, 2013

Fermi: Cache configuration default at compile time From shared to L1

Related topics