I need to profile the cache hit ratios to see the details of some optimizations. How can I do that using the command-line profiler? I prefer the command-line profiler, as I need to profile a large number of runs. It seems the cuda command-line profiler can not recognize “l1_cache_global_hit_rate” and “l2_l1_read_hit_rate” in the configuration file.
The CUDA command line profile only supports collection of raw counters. hitrate is a metric. nvprof (ships with CUDA 5.0 and above) supports capture of metrics.
The following write-up was from a quick glance at nvprof --query-events. I did not test the results. I would recommend that you run nvprof or visual profiler on one kernel and compare the results. These directions are for gf100 only.
For the CUDA Comamnd Line Profiler
L1 Hit Rate
Add to the config file
l1_global_load_hit
l1_global_load_miss
This will not include uncached global loads, global stores, or atomics.