L2 Cache mechanism for streaming data?

full015 · August 25, 2022, 3:16am

GPU: Quadro RTX 4000
CUDA: 11.7

For simple test case like:

__global__ void test(float *a, float *b, float *c, const int n) {
	unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;

	if (idx < n) {
		c[idx] = a[idx] + b[idx];
	}
}

Memory Chart:

Why L2 hit is not 0%? What type of data had been cached? Who decides this part of data was to be cached?

Thank you.

Robert_Crovella · August 25, 2022, 5:20am

RTX 4000 seems to have 4MB L2 cache. It may have been populated with the cudaMemcpy operations you ran prior to this kernel call. 4MB seems to be about the size of each of your vectors, so that would give a 33% hit rate.

Topic		Replies	Views
L2 cache hit rate of a streaming kernel is not as expected profiled in ncu CUDA Programming and Performance nsight	2	1004	March 22, 2023
L1 and L2 cache hit rate CUDA Programming and Performance	8	6793	February 3, 2016
Problem about L2 cache hit rate in A800 CUDA Programming and Performance	3	238	May 14, 2024
Why the L2 cache hit rate in L1 store requests is 100% Nsight Compute	0	329	April 18, 2025
Get a 100% L1 Cache Hit Rate CUDA Programming and Performance	3	3566	October 12, 2021
cudaMemcpy() and L2 cache. CUDA Programming and Performance	9	3687	May 6, 2023
L2 Hit Rate(Texture Reads) becomes 100% when modifying memory never used CUDA Programming and Performance	7	2773	March 17, 2018
L2 cache rate profiled in nsight compute is confused Nsight Compute	5	3233	July 3, 2024
Does L2 cache hit ratio have nothing to do with L2 cache persistence? CUDA Programming and Performance	1	64	April 18, 2025
Question about GPU L2 cache memory access。 Nsight Compute cuda , kernel	5	1126	February 21, 2024

L2 Cache mechanism for streaming data?

Related topics