Weird Number for L2 Cache Hitrate

I’m trying to get L2 Cache Hit Rate using Nsight Compute for a simple CUDA code and I’m using the following Section file:

Metrics {
    Label: "l2_hit_rate"
    Name: "lts__t_sector_hit_rate.pct"
Metrics {
    Label: "l2_tex_read_hit_rate"
    Name: "lts__t_sector_op_read_hit_rate.pct"
Metrics {
    Label: "l2_tex_read_transactions"
    Name: "lts__t_sectors_srcunit_tex_op_read.sum"
Metrics {
    Label: "l2_tex_write_hit_rate"
    Name: "lts__t_sector_op_write_hit_rate.pct"
Metrics {
    Label: "l2_tex_write_transactions"
    Name: "lts__t_sectors_srcunit_tex_op_write.sum"

However depending on which of these metric I include the output of Nsight is different. Here for each result, those metrics that are not included have been commented out:

l2_hit_rate                  %                         171.46


l2_hit_rate                  %                          97.93
l2_tex_read_hit_rate         %                           4.75
l2_tex_write_hit_rate        %                            100
l2_tex_read_transactions     sector                         0
l2_tex_write_transactions    sector                         4


l2_hit_rate                   %                         179.27
l2_tex_read_hit_rate          %                       2,418.75
l2_tex_write_hit_rate         %                            100

I was wondering what is going on here? More importantly what does larger than 100% mean? I’ve seen this behavior before for Utilization as well.

I have tried Nsight Compute 2019.5 and 2019.1 on two separate machine both running Ubuntu 18.04:

Driver Version: 430.50
CUDA Version: 10.1


GPU: Quadro RTX 8000
Driver Version: 440.64
CUDA Version: 10.2

The L2 cache is a shared resource in the NVIDIA GPU that is accessed by many different units. A number outside of 0-100% implies that the metric was not able to be collected accurately. This out of range value generally occurs when the workload submitted has one or more of the following properties:

  1. Launched kernel is too small to saturate the GPU.
  2. Launched kernel has very different work per CTA.

The example above appears to issue very littel work (1). Out of range metrics often occur when the profiler replays the kernel launch and the work distribution is significantly different. A metric such as hit rate (hits / queries) can have significant error if hits and queries are collected on different replays and the kernel does not saturate the GPU to reach a steady state (generally > 20 µs). The other cause of significant error can be when another GPU engine (display, copy engine, video encoder, video decoder, etc. access shared memory during the profiling session. If the kernel is small the other engine can cause significant confusion in the L2 results. The l2_hit_rate includes all clients. The l2_tex is limited to the target kernel as that will be the only engine using the L1/TEX unit.

Please increase the size of the workload such that it saturates the GPU. This should result in correct metrics.

1 Like