Cache size effect

bgrlrnt · March 14, 2025, 4:15pm

Hi,
I wrote a simple program to check cache size effect :


__global__ void addTestLoop(double* tabA, double* tabB, double* tabC, int nbElt, int nbTest)
{
	for (long long int idxTest = 0; idxTest < nbTest; idxTest++)
		for (int i = 0; i < nbElt; i++)
			tabC[i] = tabA[i] + tabB[i];
}

I have got nice result for some cards:

speed is function of data size.
I found cache size using techpowerup web site.

but for some cards resuts are not consistent with cache size :

I do not understand.
Is L1 cache size for gtx 1050 is 49152 and L2 size 1048576?
Or may be something is wrong in my understanding of cache effect in cuda card?

Robert_Crovella · March 14, 2025, 4:32pm

The L2 cache sizes for cards that are most important for compute work are generally documented in the architecture whitepapers. However many lower-end GPUs don’t have that level of documentation available. Nevertheless the the GTX 1050 appears to have 1024K i.e. 1MB of L2 cache.

For L1 cache, the L1 is generally part of the SM design, and so should be the same across designs of the same compute capability. GTX 1050 is a Pascal device, and so details of the L1 cache behavior are available in the Pascal tuning guide. The GTX 1050 will be similar in behavior to the references there to GP104. Note the mention there of when global loads are cached in L1.

bgrlrnt · March 17, 2025, 2:35pm

Thanks

I made some change in my programs and install ubuntu 20 or 24 to work on linux. Now results are nice for rtx 3090 and rtxada500

for old card gt730 cache L2 is found but where is cache L1 (16Kb or 48Kb)

for gtx 1050 on windows something is wrong no L1 cache and L2 cache is wrong and I can found what : may be windows is using card so L1 cache and L2 cache cannot be found?

Why there is an improve of process speed between 2^5 and 2^10 bytes for all card?

Curefab · March 17, 2025, 5:12pm

The relative overhead of running kernels is greatest for the smallest data size. Your formula is time / data. And if time has some additional small contributions, they get larger, if data size is smaller.

Robert_Crovella · March 17, 2025, 6:45pm

As indicated in a link I already provided, some GPUs including the GP10x Pascal series do not cache global loads in L1, at least by default.

Topic		Replies	Views
Unified cache size in P100 CUDA Programming and Performance	1	456	January 6, 2019
Memory transaction size CUDA Programming and Performance	1	1818	February 12, 2017
Instruction Cache CUDA Programming and Performance	1	4670	January 19, 2012
Actual L1 size in Volta and Turing CUDA Programming and Performance	5	1756	December 29, 2019
Jetson TX2 Cache Line Size Jetson TX2	10	2671	October 18, 2021
Question about GPU L2 cache memory access。 Nsight Compute cuda , kernel	5	1154	February 21, 2024
GeForce GT 730 l1 cache? CUDA Programming and Performance	2	1392	May 26, 2015
Granularity of L1 and L2 Cache CUDA Programming and Performance cuda	4	151	April 1, 2026
Question about kernel granularity CUDA Programming and Performance	5	1322	March 22, 2017
Cache L1 block size Jetson TX2 kernel	4	808	March 9, 2022

Cache size effect

Related topics