Non-aggregated profiling result support

The nvprof is not supported starts from Ampere GPU.

“aggregated-mode=off” used to be supported in nvprof, and I am looking for a nwe profiling tool that shows the non-aggregated result for the L2 cache or Memory metric/event.

For example, the result look like this:

 l2_subp0_write_sector_misses (32 instances)           -           -  [          0          0          0          0          0          0          0          0          0          0          0          0          0          8          0          0          0          4          0          0          0          0          0          0          0          0          0          0          0          0          0          0 ]           -
 l2_subp1_write_sector_misses (32 instances)           -           -  [          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          8          0          0          0          0          0          0          0          0          0          0          0          0          0          0 ]           -
l2_subp0_read_sector_misses (32 instances)           -           -  [          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          5          0          0          0          0          0          0          0          0          0          0          0          0          0          0 ]           -
 l2_subp1_read_sector_misses (32 instances)           -           -  [          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0 ]           -

Does CUPTI have similar functionality?

nvprof events and metrics profiling is based on the event and metric APIs from CUPTI and these are supported on Volta and older GPU architectures only. These APIs are replaced by CUPTI Profiling API which is supported for Volta and higher GPU architectures, and this API does not support per instance metrics. For nvprof metrics listed by you, the equivalent metrics from Profiling API are lts__t_sectors_op_read_lookup_miss.sum and lts__t_sectors_op_write_lookup_miss.sum and these provide aggregate values across all L2 instances.
Non-aggregated metric values for L2 cache or Memory are not supported on Ampere GPU by any tool (CUPTI or Nsight Compute).