L2 cache allocation

bibsys · April 20, 2020, 7:55am

Hello everyone,

My goal is to fill the GPU’s L2 cache with an array.

Is there a way to create a variable in a specific hardware memory zone of my GPU with Cuda ?

Does anyone has any suggestions or documents where I can find some information ?

Jimmy_Pettersson · April 20, 2020, 12:26pm

You can be somewhat certain of values residing in:

shared memory (shared )
constant memory (constant )
registers (if no local memory spillage)
texture memory (will depend on cache size on specific hardware).

In the case of L2 you can expect data that you’ve treated a certain way to reside there but I don’t think you will actually know before doing extensive profiling.

Use the PG for reference: Programming Guide :: CUDA Toolkit Documentation

bibsys · April 20, 2020, 5:06pm

Thank you for your response.

I forgot to mention that I’ve read the PG, and that I know I can allocate data in the different memory zones (global, constant, shared…).

To what I understood of the PG, if I want to fill the L2 cache, I need to allocate my array in the global memory zone.
I also know the exact size of the L2 memory on my GPU.
My problem is that when I try to find out if this array is large enough to fill the L2 memory, giving the execution time of a simple program that write/read in each cell of the array, I fond no significant difference between different array sizes (smaller, equal, or bigger than my cache size).

Do you have any idea that could help me ?

luis.leon · April 23, 2020, 5:33pm

Hi @bibsys

You understood well the part of allocating the array as global. The idea of global is to communicate your SMs (or blocks) by using a common memory space. In principle, the global memory is the main memory of the GPU, but to have fast access to the data, the accesses to the main memory are cached into the L2.

Getting lower bandwidth may be caused by:

Non-sequential memory accesses
Strided memory access
Resource starvation

In the first case, if you have contiguous data, let’s say an array of doubles, and you access them sequentially, it would be the ideal case. You can have sequential memory accesses but strided, which reduces the possibility of having more cached elements.

Considering the case of having your whole array in the cache, it could happen that many SMs may query the L2, leading to a resource starvation condition. So basically, if you are trying to access to the same resource, some SMs needs to wait to get access to the L2.

This part of the documentation is fundamental: Programming Guide :: CUDA Toolkit Documentation

You need to take into account also the cache line, which will guide you to exploiting the cache in a better way, by also using the L1 cache. Now, if your goal is to store an array in the L2 but coding, as far as I know, there is no a way to do it, but just tweaking sizes and accesses.

Regards,
Leon

bibsys · April 27, 2020, 9:52am

Hi Leon,

Thank you for this information, it really helped !
I’ve managed to find the results I was looking for.

Topic		Replies	Views
L2 cache allocation CUDA Developer Tools	0	361	April 17, 2020
global memory caching CUDA Programming and Performance	4	1409	March 13, 2012
Cache coherence of GPU CUDA Programming and Performance	3	73	April 28, 2025
The granularity of L1 and L2 caches CUDA Programming and Performance cuda	2	1233	April 18, 2024
Anyway to force several bytes to be in L1/L2 cache so that I can use it across multiple threadblocks within one kernel? CUDA Programming and Performance	2	452	June 24, 2022
Question about bandwidth between l2 cache and l1 cache CUDA Programming and Performance	2	51	April 15, 2025
Allocate and work __device__ function in DDR memory CUDA Programming and Performance	11	688	July 23, 2019
CUDA: How do I use L2 cache in Fermi? Legacy PGI Compilers	3	5405	June 22, 2011
variable cache line width ? CUDA Programming and Performance	4	2039	January 13, 2015
Difference between L2 read/write transactions and L2_L1 read/write transactions ? CUDA Programming and Performance	3	1488	August 28, 2019

L2 cache allocation

Related topics