Measuring peak read/write bandwidth across device memory

Hi all,

I wanted to measure the peak read/write (load/store) bandwidth separately, across the device memory during my kernel executions. I was wondering whether Nsight had any parameter which could capture that.

If not, is there any other tool I could possibly use to obtain it? I know NVML provided functionality to do that for data across PCIe, but I needed it for GPU memory.


I would suggest to start by collecting the MetricWorkloadAnalysis* sections, either separately, or together with any other metrics and/or sections you are interested in. This should give you several tables and charts in the UI, when opening as a report.

nv-nsight-cu-cli --section "MemoryWorkloadAnalysis.*" (app)

If you only want to collect individual metrics, you can start with

nv-nsight-cu-cli --metrics dram__bytes_write.sum,dram__bytes_read.sum,dram__bytes_write.sum.pct_of_peak_sustained_elapsed,dram__bytes_read.sum.pct_of_peak_sustained_elapsed (app)

See for the list of available sections. The current active set is also available via --list-section or in the Sections/Rules Info window in the UI.

See the --query-metrics and --query-metrics-mode command line options in for how to query individual metric names.

1 Like