Can't profile L1 and L2 hit ratios on K40 and Titan Z

bowuwm · February 18, 2016, 5:14am

I tried to profile L1 and L2 cache hit ratios on K40 and Titan Z cards through the following command.

nvprof --metrics l1_cache_global_hit_rate ./vecadd

vecadd is just a simple vector addition CUDA program. Though I’m sure the kernel is finished successfully, the output min, max, and avg for the metrics l1_cache_global_hit_rate and l2_cache_global_hit_rate are all 0.00%. Does that mean K40 and Titan Z do not support profiling L1 and L2 cache hit ratios?

Robert_Crovella · February 18, 2016, 8:08am

Well, it only looks like you are asking for l1 cache hit rate, right?

K40 is a kepler device. It has L1 turned off for ordinary global load caching.

Titan Z is also a kepler device. It also has L1 turned off for the same scenario.

The above items are covered in the documentation. Take a look at the kepler tuning guide, for example.

Regarding L2, for a very simple program (say, that reads a vector exactly once) it’s possible that it never hits in L2.

Greg · February 20, 2016, 1:38am

K40 and TitanZ are both based upon gk110b. You should be able to enable L1 caching of global loads in both chips. See [url]Programming Guide :: CUDA Toolkit Documentation.

In the vector add sample each warp reads 32 consecutive 32-bit values from unique addresses. No address is read or written multiple times so as expected the cache hit rate is 0%. If you change the sample such that for one of the vectors every thread reads from B[0] (same address for all threads) then you should see read hit rate of ~50% because all A accesses miss and all B accesses hit (expect for first access on every SM). If you see this in L2 but not L1 follow the directions in the link above to try to enable L1 caching. You may also have to look at the assembly code (not PTX) as the compiler will likely access A and B vectors using LDG instruction which uses the texture cache not the L1 cache.

Topic		Replies	Views
L1 and L2 cache hit rate CUDA Programming and Performance	8	6578	February 3, 2016
Understanding the functioning of nvprof and .cv data load option CUDA Programming and Performance	8	3072	December 11, 2014
How to profile L1 and L2 hit ratios on Tesla C2050 cards using the command-line profiler? CUDA Programming and Performance	1	1476	June 8, 2013
L1 Cache Hit Rate is Zero on Pascal CUDA Programming and Performance	2	596	November 29, 2021
Why L1 cache hit ratio become zero on K20? CUDA Programming and Performance	10	5626	January 17, 2013
How can I make Quadro K420 skip L1 and L2 caches when loading a variable? CUDA Programming and Performance	3	951	April 8, 2018
Memory transaction size CUDA Programming and Performance	1	1734	February 12, 2017
Tesla K40 L2 bandwidth CUDA Programming and Performance	12	4039	December 23, 2015
Problem about L2 cache hit rate in A800 CUDA Programming and Performance	3	184	May 14, 2024
What is the expected L1/L2 hit rate for fully coalesced accesses? CUDA Programming and Performance	10	110	January 8, 2025

Can't profile L1 and L2 hit ratios on K40 and Titan Z

Related topics