Profiling device memory bandwidth utilization

Hello. I want to inquire about whether it is possible to profile the device memory bandwidth utilization (in percent) with Nsight Compute. If not, is there a way to calculate the value with given metrics such as
lts__t_sectors_aperture_sysmem_op_write.sum.per_second and lts__t_sectors_aperture_sysmem_op_read.sum.per_second?

Yes, you can collect the Memory Workload Analysis sections (header, chart and tables) to get a comprehensive memory analysis, e.g. using

ncu --section "regex:MemoryWorkloadAnalysis(_Chart|_Tables)?" <other args>

These are also part of the ‘full’ section set, so you my want to just use

ncu --set full <other args>

Note that the metrics you referenced are for system memory, not device memory. For the latter, you would want to use dram__bytes_read|write.sum.per_second or dram__bytes_read|write.sum.pct_of_peak_sustained_elapsed. To get the overall max bandwidth, you can use gpu__compute_memory_request_throughput.avg.pct_of_peak_sustained_elapsed. Please refer to the memory workload analysis section file and the respective chart and table (tooltips) in the UI.

@felix_dt Thanks for your reply. Just to make an another quick check, can you verify the appropriateness of the metrics that I am using? I am quite uncertain about device L2 metric, since its units are in ‘sectors’. It would be the best if I could extract the device L2 utilization in percentage.

  • smsp__cycles_active.avg.pct_of_peak_sustained_elapsed: SM utilization
  • dram__throughput.avg.pct_of_peak_sustained_elapsed: Device memory utilization
  • dram__bytes_read|write.sum.pct_of_peak_sustained_elapsed: Device memory bandwidth utilization
  • lts__t_sectors.avg.pct_of_peak_sustained_elapsed: Device L2

For SM utilization, I recommend to use sm__throughput.avg.pct_of_peak_sustained_elapsed (see the Speed Of Light section). The others looks correct to me.

Thanks. Your reply was very helpful.

