I tried to profile L1 and L2 cache hit ratios on K40 and Titan Z cards through the following command.
nvprof --metrics l1_cache_global_hit_rate ./vecadd
vecadd is just a simple vector addition CUDA program. Though I’m sure the kernel is finished successfully, the output min, max, and avg for the metrics l1_cache_global_hit_rate and l2_cache_global_hit_rate are all 0.00%. Does that mean K40 and Titan Z do not support profiling L1 and L2 cache hit ratios?