L2 cache in A100 provides 179% hit rate!

Hello every one, I’m building a model for the A100 GPU, and to do that, I needed to demystify the caches.
While I doing that, I found that sometimes (not only once) the L2 cache provides a hitrate more than 100%
for example it provided 179%, 130% and 102%
the project details as follow:
GPU: A100
Driver Version: 515.48.07
CUDA Version: 11.3.1
This is the benchmark :

and to be specific, the app path is : main/Benchmarks/PolyBench/linear-algebra/gramschmidt

I’m using 108 blocks (1 block per sm)
and 256 thread per block
I also attached the full report of the nvsight metrics

final1.txt (81.9 MB)

ramschmidt_kernel3(int, int, float*, float*, float*, int), 2022-Dec-14 23:30:37, Context 1, Stream 7
Section: Memory Workload Analysis

Memory Throughput Mbyte/second 34.48
Mem Busy % 0.67
Max Bandwidth % 0.42
L1/TEX Hit Rate % 0
L2 Compression Success Rate % 0
L2 Compression Ratio 0
L2 Hit Rate % 179.38
Mem Pipes Busy % 0.01

Section: Occupancy
---------------------------------------------------------------------- --------------- ------------------------------
Block Limit SM                                                                   block                             32
Block Limit Registers                                                            block                              8
Block Limit Shared Mem                                                           block                            164
Block Limit Warps                                                                block                              8
Theoretical Active Warps per SM                                                   warp                             64
Theoretical Occupancy                                                                %                            100
Achieved Occupancy                                                                   %                          12.36
Achieved Active Warps Per SM                                                      warp                           7.91
---------------------------------------------------------------------- --------------- ------

Values exceeding 100% by a small amount are expected for metrics collected across multiple passes. A value of 179% is certainly not expected. Since it’s not stated explicitly in your description, please make sure to collect a report using the latest version of Nsight Compute. It is backwards compatible with older driver and toolkit versions. You should also ensure that you did not disable clock control in your profiling run by passing --clock-control none.