Hello every one, I’m building a model for the A100 GPU, and to do that, I needed to demystify the caches.
While I doing that, I found that sometimes (not only once) the L2 cache provides a hitrate more than 100%
for example it provided 179%, 130% and 102%
The benchmark that I’m running is polybench->linear_algebra->gramchmit app
ramschmidt_kernel3(int, int, float*, float*, float*, int), 2022-Dec-14 23:30:37, Context 1, Stream 7
Section: Memory Workload Analysis
---------------------------------------------------------------------- --------------- ------------------------------
Memory Throughput Mbyte/second 34.48
Mem Busy % 0.67
Max Bandwidth % 0.42
L1/TEX Hit Rate % 0
L2 Compression Success Rate % 0
L2 Compression Ratio 0
L2 Hit Rate % 179.38
Mem Pipes Busy % 0.01
---------------------------------------------------------------------- --------------- ------------------------------
Section: Occupancy
---------------------------------------------------------------------- --------------- ------------------------------
Block Limit SM block 32
Block Limit Registers block 8
Block Limit Shared Mem block 164
Block Limit Warps block 8
Theoretical Active Warps per SM warp 64
Theoretical Occupancy % 100
Achieved Occupancy % 12.36
Achieved Active Warps Per SM warp 7.91
---------------------------------------------------------------------- --------------- ------------------------------