Does L2 cache hit ratio have nothing to do with L2 cache persistence?

Hello, I’m recently studying and trying to utilize L2 cache performance.
AFAIK, If we set the base_pointer and window_size, and set hit_ratio as 1 that device memory area only can access to limited L2 cache area(reffered as set-aside in NVIDIA Document)
Below is the statment in NVIDIA Cuda-Programming Docuement

  • With a hitRatio of 1.0, the hardware will attempt to cache the whole 32KB window in the set-aside L2 cache area. Since the set-aside area is smaller than the window, cache lines will be evicted to keep the most recently used 16KB of the 32KB data in the set-aside portion of the L2 cache.
    "

However, when I profile the kernel “implicit_convolve_sgemm”, even when I set limited L2 cache zone as 1MB(I’m currently using RTX 3090 which has 6MB of L2 cache) and put entire workload in the window(input, weight, output), I get same L2 cache hit ratio even though I expect low L2 cache hit ratio when L2 is limited

Below is the result.
when L2 cache not limited, L2 hit ratio: 97.77%
when L2 cache limited, L2 hit ratio: 97.78%

It seems like the data in window is accessing entire L2 cache zone.
Can someone please explain?
Thank you in advance!

It probably depends on the kernel and algorithm, how much L2 it uses. The hit ratio is nearly 100%, so 1MB seems to be more than enough to achieve that.

Also if the blocks well distribute the work, the sum of the sizes of the L1 cache often compensate.