How to profile L1 and L2 hit ratios on Tesla C2050 cards using the command-line profiler?

Hi,

I need to profile the cache hit ratios to see the details of some optimizations. How can I do that using the command-line profiler? I prefer the command-line profiler, as I need to profile a large number of runs. It seems the cuda command-line profiler can not recognize “l1_cache_global_hit_rate” and “l2_l1_read_hit_rate” in the configuration file.

Bo

The CUDA command line profile only supports collection of raw counters. hitrate is a metric. nvprof (ships with CUDA 5.0 and above) supports capture of metrics.

The following write-up was from a quick glance at nvprof --query-events. I did not test the results. I would recommend that you run nvprof or visual profiler on one kernel and compare the results. These directions are for gf100 only.

For the CUDA Comamnd Line Profiler

L1 Hit Rate

  • Add to the config file
    l1_global_load_hit
    l1_global_load_miss
  • This will not include uncached global loads, global stores, or atomics.

l1_cache_global_hit_rate = l1_global_load_hit / (l1_global_load_hit + l1_global_load_miss)

L2 L1 Read Hit Rate

  • Add to the config file
    l2_subp0_read_hit_sectors
    l2_subp0_read_queries
  • This cannot be collected for both subp0 and subp1 in the same pass. The hitrate for sub-partitions is usually very consistent.

l2_l1_read_hit_rate = l2_subp0_read_hit_sectors / l2_subp0_read_sector_queries