understanding L2 requests

bowuwm · May 2, 2012, 3:23pm

Hi,

 I used visual profiler on mac to profile a sparse matrix vector multiplication kernel. I found that the number of L1 misses * 15(num. of SMs) is not equal to the number of L2 requests. Even (num. of L1 misses + num. of L1 hits) * 15 < L2 requests. Can someone explain this?

L1 hits: 677342
L1 misses: 2.07111e+06
L2 requests: 1.23936e+08

vvolkov · May 2, 2012, 9:01pm

You have 60x more L2 requests than L1 misses. 60 is 15 times 4. 15 is the number of SMs as you noted, and I’d speculate that 4 is due to the 4x difference between L2 and L1 cache line sizes.

bowuwm · May 3, 2012, 3:03am

Your speculation is quite reasonable. I think that is the reason. Thanks!

Topic		Replies	Views
Profiler counters meaning Visual Profiler and nvprof	1	2464	January 17, 2013
L2 read/write misses greater than requests CUDA Programming and Performance	11	3033	May 11, 2011
Question about cache metrics Nsight Compute	3	651	March 10, 2023
Kernel modification for math/memonly and profiler results Understanding values of dram_reads and gld CUDA Programming and Performance	6	1742	April 20, 2011
L2 cache read misses vs L2 cache write misses CUDA Programming and Performance	5	2454	February 5, 2014
Memory transaction size CUDA Programming and Performance	1	1730	February 12, 2017
Higher L2 cache hit rate but larger device memory tranfer size CUDA Programming and Performance nsight , profiling	1	770	August 13, 2023
Nsight compute "Sectors Misses to L2" greater than "Sectors" Nsight Compute cuda	2	454	September 27, 2021
Difference between L2 read/write transactions and L2_L1 read/write transactions ? CUDA Programming and Performance	3	1446	August 28, 2019
Metrics divergence on sgemm vs matrixMul Visual Profiler and nvprof	0	680	January 23, 2020

understanding L2 requests

Related topics