Switch off L1 cache

luisgo · March 20, 2015, 1:06pm

Dear All

If I have a kernel that consumes more than the L1 cache available, do I have advantage to switch off L1 cache and use only L2 cache? If so how I switch off (and back) L1 cache on fly?

How much performance gain I could have with that?

Thanks

Luis Gonçalves
External Media

Skybuck · March 24, 2015, 4:46am

The underlieing PTX instruction set has some modifiers for the load instruction and perhaps other instructions as well, to change caching behaviour and potentially bypass it.

I think this feature is also leaked in high level language (CUDA C) perhaps via keyword “volatile” and perhaps others… so consult manuals… (perhaps restrict or so… had to do something with loading… might be of some use).

CUDA API also has some functionality to specific cache preference… might help as well to reduce L1 cache size or increase it…

Concerning L2 cache… that remains a mystery to me ;) Perhaps there is something new for it in API or documentation ? ;)

Greg · March 24, 2015, 8:00pm

If the instructions memory access is highly divergent and address ranges are only access 1 time then there can be bandwidth savings associated with performing uncached global loads. Caching can be controlled on a per instruction basis using inline PTX. The L1 cache can also be disabled using the compiler option -dlcm.

For more information see the Global Memory section in the CUDA Programming Guide for the compute capability of your GPU.

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-memory-2-x
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-memory-3-0
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-memory-5-x

If you are developing for a compute capability 3.5 device you may also want to investigate the LDG instruction which performs read-only global access through the texture cache. The texture cache can have better performance for highly divergent memory accesses and if the application is heavily accessing shared, local, or global memory.

All device memory accesses always go through L2.

All system memory accesses currently are not cached in L2.

There are no additional controls for L2.

Topic		Replies	Views
Bypassing cache while running a benchmark CUDA Programming and Performance	1	660	April 27, 2016
Disabling cache on Fermi architectures Try to disable L1 and L2 CUDA Programming and Performance	11	9261	August 30, 2013
L1 Cache, L2 Cache and Shared memory in Fermi CUDA Programming and Performance	5	23541	March 21, 2011
How can I check and see if my GPU is using L1 cache CUDA Programming and Performance	7	2972	June 9, 2011
How to force the GPU to drop its cache? CUDA Programming and Performance	1	451	October 27, 2017
variable cache line width ? CUDA Programming and Performance	4	2022	January 13, 2015
Anyway to force several bytes to be in L1/L2 cache so that I can use it across multiple threadblocks within one kernel? CUDA Programming and Performance	2	447	June 24, 2022
How can I make Quadro K420 skip L1 and L2 caches when loading a variable? CUDA Programming and Performance	3	951	April 8, 2018
Bypassing cache in Fermi CUDA Programming and Performance	16	4782	August 28, 2010
L1-L2-Global how to clearly describe their interaction for a given kernel CUDA Programming and Performance	3	2065	April 15, 2012

Switch off L1 cache

Related topics