Does L2 cache hit ratio have nothing to do with L2 cache persistence?

namch0101 · April 18, 2025, 7:10am

Hello, I’m recently studying and trying to utilize L2 cache performance.
AFAIK, If we set the base_pointer and window_size, and set hit_ratio as 1 that device memory area only can access to limited L2 cache area(reffered as set-aside in NVIDIA Document)
Below is the statment in NVIDIA Cuda-Programming Docuement

With a hitRatio of 1.0, the hardware will attempt to cache the whole 32KB window in the set-aside L2 cache area. Since the set-aside area is smaller than the window, cache lines will be evicted to keep the most recently used 16KB of the 32KB data in the set-aside portion of the L2 cache.
"

However, when I profile the kernel “implicit_convolve_sgemm”, even when I set limited L2 cache zone as 1MB(I’m currently using RTX 3090 which has 6MB of L2 cache) and put entire workload in the window(input, weight, output), I get same L2 cache hit ratio even though I expect low L2 cache hit ratio when L2 is limited

Below is the result.
when L2 cache not limited, L2 hit ratio: 97.77%
when L2 cache limited, L2 hit ratio: 97.78%

It seems like the data in window is accessing entire L2 cache zone.
Can someone please explain?
Thank you in advance!

Curefab · April 18, 2025, 10:26am

It probably depends on the kernel and algorithm, how much L2 it uses. The hit ratio is nearly 100%, so 1MB seems to be more than enough to achieve that.

Also if the blocks well distribute the work, the sum of the sizes of the L1 cache often compensate.

Topic		Replies	Views
NCU profiling shows unexpected results Nsight Compute	2	149	May 23, 2025
Set persisting area on L2 cache CUDA Programming and Performance	1	37	March 27, 2025
Adjusting L2 Persistence Cache Hit Ratio CUDA Programming and Performance cuda	1	457	January 18, 2024
Memory transaction size CUDA Programming and Performance	1	1734	February 12, 2017
L1 and L2 cache hit rate CUDA Programming and Performance	8	6580	February 3, 2016
L1 cache hits 0% CUDA Programming and Performance	2	1104	June 1, 2013
Problem about L2 cache hit rate in A800 CUDA Programming and Performance	3	185	May 14, 2024
L2 cache in A100 provides 179% hit rate! Nsight Compute	1	744	January 4, 2023
L1 Cache Hit Rate is Zero on Pascal CUDA Programming and Performance	2	596	November 29, 2021
Higher L2 cache hit rate but larger device memory tranfer size CUDA Programming and Performance nsight , profiling	1	774	August 13, 2023

Does L2 cache hit ratio have nothing to do with L2 cache persistence?

Related topics