L2 cache in A100 provides 179% hit rate!

user40368 · January 1, 2023, 1:34am

Hello every one, I’m building a model for the A100 GPU, and to do that, I needed to demystify the caches.
While I doing that, I found that sometimes (not only once) the L2 cache provides a hitrate more than 100%
for example it provided 179%, 130% and 102%
the project details as follow:
GPU: A100
Driver Version: 515.48.07
CUDA Version: 11.3.1
This is the benchmark :

and to be specific, the app path is : main/Benchmarks/PolyBench/linear-algebra/gramschmidt

I’m using 108 blocks (1 block per sm)
and 256 thread per block
I also attached the full report of the nvsight metrics

final1.txt (81.9 MB)

ramschmidt_kernel3(int, int, float*, float*, float*, int), 2022-Dec-14 23:30:37, Context 1, Stream 7
Section: Memory Workload Analysis

Memory Throughput Mbyte/second 34.48
Mem Busy % 0.67
Max Bandwidth % 0.42
L1/TEX Hit Rate % 0
L2 Compression Success Rate % 0
L2 Compression Ratio 0
L2 Hit Rate % 179.38
Mem Pipes Busy % 0.01

Section: Occupancy
---------------------------------------------------------------------- --------------- ------------------------------
Block Limit SM                                                                   block                             32
Block Limit Registers                                                            block                              8
Block Limit Shared Mem                                                           block                            164
Block Limit Warps                                                                block                              8
Theoretical Active Warps per SM                                                   warp                             64
Theoretical Occupancy                                                                %                            100
Achieved Occupancy                                                                   %                          12.36
Achieved Active Warps Per SM                                                      warp                           7.91
---------------------------------------------------------------------- --------------- ------

felix_dt · January 4, 2023, 8:10am

Values exceeding 100% by a small amount are expected for metrics collected across multiple passes. A value of 179% is certainly not expected. Since it’s not stated explicitly in your description, please make sure to collect a report using the latest version of Nsight Compute. It is backwards compatible with older driver and toolkit versions. You should also ensure that you did not disable clock control in your profiling run by passing --clock-control none.

Topic		Replies	Views
L2 cache in A100 provides 179% hit rate! CUDA Programming and Performance	7	1328	December 25, 2022
Weird Number for L2 Cache Hitrate Nsight Compute nsight	1	1365	April 25, 2020
How to reach peak bandwidth of L2 cache on A100 CUDA Programming and Performance	3	1463	January 4, 2022
L2 hit rate always at 100% Nsight Compute	9	1058	July 17, 2023
L2 cache rate profiled in nsight compute is confused Nsight Compute	5	2732	July 3, 2024
Meanings of L2 --> L2 copy Nsight Compute	1	651	January 17, 2022
Higher L2 cache hit rate but larger device memory tranfer size CUDA Programming and Performance nsight , profiling	1	743	August 13, 2023
L1 cache hits 0% CUDA Programming and Performance	2	1090	June 1, 2013
L2 hit rate >100% Nsight Compute	1	528	December 11, 2020
L2 cache hit rate of a streaming kernel is not as expected profiled in ncu CUDA Programming and Performance nsight	2	924	March 22, 2023

L2 cache in A100 provides 179% hit rate!

Related topics