understanding L2 requests


 I used visual profiler on mac to profile a sparse matrix vector multiplication kernel. I found that the number of L1 misses * 15(num. of SMs) is not equal to the number of L2 requests. Even (num. of L1 misses + num. of L1 hits) * 15 < L2 requests. Can someone explain this?

L1 hits: 677342
L1 misses: 2.07111e+06
L2 requests: 1.23936e+08

You have 60x more L2 requests than L1 misses. 60 is 15 times 4. 15 is the number of SMs as you noted, and I’d speculate that 4 is due to the 4x difference between L2 and L1 cache line sizes.

Your speculation is quite reasonable. I think that is the reason. Thanks!