L2 hit rate always at 100%

anthonyJK1 · June 21, 2023, 11:00pm

Hello,

I am profiling a simple application that stores a constant value to a matrix. I am using it on a Jetson AGX Xavier and with the NCU 2022.2.1.0 version. The memory access is coalesced (using the absolute index of the thread and not the threadidx only). Since the L2 cache on the Xavier has a size of 512 KB, I chose to have bigger size of array(1MB,10MB…). But I am having always a L2 cache hit of 100% using ncu.

__global__ void
memoryKernelSingleSM(volatile unsigned int* d_matrix, int *d_result) {

	int ind = blockIdx.x * blockDim.x + threadIdx.x; 
	volatile unsigned int r_sum; 
	r_sum = 0;

	#if RW==0
	r_sum = d_matrix[ind*ELEMENTS_PER_STRIDE];
	#elif RW==1
	d_matrix[ind] = 7;
	#endif	

}

This example is with 1MB:

with 2MB

Why I am having this despite having a matrix which size is bigger than the cache size.

N.B: the number of threads is enough to access each element of the matrix (4B per element) and the SASS is displaying the store instruction.

Thank you for your support

jmarusarz · June 22, 2023, 8:17pm

In L2, the cache policy is set so that all stores are considered hits. Only loads will cause misses. From the chart, it looks like only stores are occurring. Does that answer your question?

anthonyJK1 · June 22, 2023, 8:23pm

Yes thank you but what is the L2 cache policy? is it a write-through policy with no allocation on write miss?

mahmood.nt · June 23, 2023, 6:55pm

You may want to check this either

https://forums.developer.nvidia.com/t/l2-cache-in-a100-provides-179-hit-rate

jmarusarz · June 29, 2023, 8:35pm

The cache policy for writes in L2 is write-back by default.

anthonyJK1 · June 29, 2023, 9:05pm

In that case, why a write-back policy gives a 100% cache hit? Since in a write-back policy, the cache line is written back to the memory only when the line is evicted. So in that case, we shouldn’t have any data written back to memory if we have 100% cache hit.

jmarusarz · July 6, 2023, 8:27pm

For stores, the metrics are defined as always counting a hit because (I’m generalizing here) from the instruction’s perspective, it writes to L2 and returns. Whether that causes an eviction or not, the store is unaware, and it does not impact the store instruction. For that reason, it was chosen to count the stores this way. It will always count the stores as a 100% hit rate, but that does not imply that no data is written back to memory.

anthonyJK1 · July 7, 2023, 7:23am

Great thank you.

Is there any documentation that illustrate how the metrics are defined or calculated?

One last thing: The %peak represents what actually? I am asking because for example sometimes I get 7GB/s as throughput (from the SoC memory to L2 cache) but the %peak is equal to 60% which does not represent 60% of 136.5GB/s which is the maximum bandwidth of memory bus.

jmarusarz · July 13, 2023, 8:37pm

The best place to look for metric definitions is the Kernel Profiling Guide although it may not have all the information you’re looking for.

With respect to the throughput, can you share what specific metric you’re looking at to get the 7GB/s and 60% of peak, and also where the 136.5GB/s number is coming from? In general, the percentage of peak is defined as utilized percentage of the hardware’s peak performance so your assumption seems correct, but we would need to dig deeper.

anthonyJK1 · July 17, 2023, 8:08am

Hello,

Here is an example:

As we can see for a 6.48 GB/s, I am getting a 60% peak.

For the maximum bandwidth, I am refering to this manual (page 11, table 3):

Thank you.

Topic		Replies	Views
L2 cache in A100 provides 179% hit rate! Nsight Compute	1	726	January 4, 2023
Weird Number for L2 Cache Hitrate Nsight Compute nsight	1	1365	April 25, 2020
L2 cache rate profiled in nsight compute is confused Nsight Compute	5	2733	July 3, 2024
L2 hit rate >100% Nsight Compute	1	528	December 11, 2020
L2 cache hit rate of a streaming kernel is not as expected profiled in ncu CUDA Programming and Performance nsight	2	924	March 22, 2023
Understanding cache throughput in Nsight Nsight Compute	4	2468	July 30, 2021
Problem about L2 cache hit rate in A800 CUDA Programming and Performance	3	171	May 14, 2024
L2 cache read misses vs L2 cache write misses CUDA Programming and Performance	5	2447	February 5, 2014
Question about cache metrics Nsight Compute	3	641	March 10, 2023
Question about GPU L2 cache memory access。 Nsight Compute cuda , kernel	5	1014	February 21, 2024

L2 hit rate always at 100%

Related topics