The granularity of L1 and L2 caches

dbsghk129 · April 18, 2024, 6:41am

I am currently studying CUDA.

Acorrding to the 2022 CUDA C Programming Guide, “A cache line is 128 bytes and maps to a 128 byte aligned segment in device memory. Memory accesses that are cached in both L1 and L2 are serviced with 128-byte memory transactions, whereas memory accesses that are cached in L2 only are serviced with 32-byte memory transactions. Caching in L2 only can therefore reduce over-fetch, for example, in the case of scattered memory accesses.”
Based on this, I’ve encountered some questions regarding the granularity of L1 and L2 caches related to global memory access.

If both L1 and L2 cache lines are 128 bytes wide, when caching only the L2 cache, is the amount of data trasferred from globl memory to L2 (L2 granularity) the same as the cache line width of 128 bytes(equivalent to 4 sectors of the cache line) or is it 32 bytes(equivalent to 1 sector of the cache line)?

Additionally, when caching both L1 and L2, is the data transfer amount from global memory to L2 (L2 granularity) 128 bytes? and from L2 or L1 (L1 granularity) also 128 bytes? And for each warp, is the data transfer amount from L1 to the registers also 128 bytes?

Lastly, is it possible to adjust the granularity of L2 cache using the cudaDeviceSetLimit function? If it isn’t explicitly set, does the data only transfer at the default granularity?

best regards, Rawin

Robert_Crovella · April 18, 2024, 2:34pm

In modern GPUs (say, Pascal and newer) both the L1 and L2 cache can be populated sector-by-sector. The minimum granularity is 1 sector or 32 bytes. The cache line tag, however, applies to 4 sectors (in each case) that comprise the 128-byte cache line. You can adjust L2 cache granularity.

system · May 2, 2024, 2:35pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
L1 Cache Effective Bandwidth CUDA Programming and Performance	2	1117	March 9, 2023
Cache line size of L1 and L2 CUDA Programming and Performance	3	20728	November 14, 2011
Memory Transaction Width and L2 Cache Fill - Compute Capability width 2.x and 3.0 CUDA Programming and Performance	3	1341	June 28, 2012
variable cache line width ? CUDA Programming and Performance	4	2023	January 13, 2015
Cache access characteristics CUDA Programming and Performance	0	586	February 17, 2011
Behavior of L1/L2 caches CUDA Programming and Performance	1	458	June 2, 2023
L1-L2-Global how to clearly describe their interaction for a given kernel CUDA Programming and Performance	3	2066	April 15, 2012
Cache L1 and L2 Architecture Kepler CUDA Programming and Performance	2	3184	December 30, 2019
Global memory access patterns - too slow CUDA Programming and Performance cuda , performance	6	1239	April 7, 2024
Where can I easily find the L1 and L2 cache line size per compute capability? CUDA Programming and Performance	1	269	July 2, 2024

The granularity of L1 and L2 caches

Related topics