Fermi: Cache configuration default at compile time From shared to L1

My application has many kernels, but none will benefit from over 16k of shared memory. For this reason I want the 48k to be assigned to the L1 cache (which will benefit me). Is there any way to do set this as the default at compile time to save me cluttering up my code with hundreds (literally) of configuration calls? I had a look around but couldn’t see to find anything.

Also; there is mention of the ability to disable the L1 cache for global memory. Why would one want to do this? Maybe if you have many kernels with massive local memory spilling? Interestingly this option seems to be only available at compile time.

If your kernel only makes single-use coalesced reads, then the L1 cache will provide no benefit. Those reads might push local memory out to the L2 or global memory, as you mention. I also wonder if bypassing the L1 improves latency on global memory reads. This isn’t mentioned anywhere, but that seems like another good reason to be able to turn the cache off.

I assume the compile time option is a temporary hack to globally control the PTX generation. What we really need is some way inside the kernel to specify individual memory reads as being uncached so the compiler can generate PTX load instructions with the appropriate cache modifiers.

If your kernel only makes single-use coalesced reads, then the L1 cache will provide no benefit. Those reads might push local memory out to the L2 or global memory, as you mention. I also wonder if bypassing the L1 improves latency on global memory reads. This isn’t mentioned anywhere, but that seems like another good reason to be able to turn the cache off.

I assume the compile time option is a temporary hack to globally control the PTX generation. What we really need is some way inside the kernel to specify individual memory reads as being uncached so the compiler can generate PTX load instructions with the appropriate cache modifiers.

If you want every kernel in your app to use the same caching mode, set it for the first kernel you launch. The default mode is 48k shared/16k L1, but the default setting for individual kernels is “don’t care.” If one kernel changes the caching mode and you make no other calls to set the caching mode, it will be 16k shared/48k L1 unless you try to launch a kernel that cannot launch with 16k shmem.

If you want every kernel in your app to use the same caching mode, set it for the first kernel you launch. The default mode is 48k shared/16k L1, but the default setting for individual kernels is “don’t care.” If one kernel changes the caching mode and you make no other calls to set the caching mode, it will be 16k shared/48k L1 unless you try to launch a kernel that cannot launch with 16k shmem.