Behavior of L1/L2 caches

anthonyJK1 · June 1, 2023, 11:20pm

Hello,

I was reading about the L1 and L2 caches load and store and I have found that if there is a miss in L1 for a load instruction, L1 will get only the sector (32byte) of the 128 cache line from L2. But why do we say that the granularity of a fetching is 128byte? In which case do we fetch 128bytes? and what is the advantage of getting only 1 sector in a cache miss?

Thank you

Robert_Crovella · June 2, 2023, 1:54am

In Fermi/Kepler days, a miss on the L1 triggered a 128byte request to the L2. Somewhere between Maxwell and Pascal this changed to a 32-byte granularity.

You’ll fetch 128 bytes if you have a request that needs 128 bytes. For example if you have a warp-wide load of a float or int per thread, adjacent. The advantage of getting only 1 sector on a cache miss needs to be considered in the case of a warp request that only needs 32 bytes or less. In that case, it is preferable to request 32 bytes rather than 128.

Topic		Replies	Views
The granularity of L1 and L2 caches CUDA Programming and Performance cuda	2	1885	April 18, 2024
Pascal L1 cache CUDA Programming and Performance	21	12386	January 20, 2024
L2 cache misses CUDA Programming and Performance	3	723	September 8, 2023
Global memory access patterns - too slow CUDA Programming and Performance cuda , performance	6	1781	April 7, 2024
L1 Cache Effective Bandwidth CUDA Programming and Performance	2	1318	March 9, 2023
Memory Transaction Width and L2 Cache Fill - Compute Capability width 2.x and 3.0 CUDA Programming and Performance	3	1422	June 28, 2012
variable cache line width ? CUDA Programming and Performance	4	2178	January 13, 2015
Questions about cacheline & sector CUDA Programming and Performance	5	565	March 3, 2025
Cache line size of L1 and L2 CUDA Programming and Performance	3	21538	November 14, 2011
What is the expected L1/L2 hit rate for fully coalesced accesses? CUDA Programming and Performance	10	298	January 8, 2025

Behavior of L1/L2 caches

Related topics