The nvprof is not supported starts from Ampere GPU.
“aggregated-mode=off” used to be supported in nvprof, and I am looking for a nwe profiling tool that shows the non-aggregated result for the L2 cache or Memory metric/event.
For example, the result look like this:
l2_subp0_write_sector_misses (32 instances) - - [ 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] -
l2_subp1_write_sector_misses (32 instances) - - [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] -
l2_subp0_read_sector_misses (32 instances) - - [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] -
l2_subp1_read_sector_misses (32 instances) - - [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] -
Does Nsight Compute has similar functionality?