I wanted to measure the peak read/write (load/store) bandwidth separately, across the device memory during my kernel executions. I was wondering whether Nsight had any parameter which could capture that.
If not, is there any other tool I could possibly use to obtain it? I know NVML provided functionality to do that for data across PCIe, but I needed it for GPU memory.
I would suggest to start by collecting the MetricWorkloadAnalysis* sections, either separately, or together with any other metrics and/or sections you are interested in. This should give you several tables and charts in the UI, when opening as a report.
nv-nsight-cu-cli --section "MemoryWorkloadAnalysis.*" (app)
If you only want to collect individual metrics, you can start with
nv-nsight-cu-cli --metrics dram__bytes_write.sum,dram__bytes_read.sum,dram__bytes_write.sum.pct_of_peak_sustained_elapsed,dram__bytes_read.sum.pct_of_peak_sustained_elapsed (app)
See https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#sections-and-rules for the list of available sections. The current active set is also available via --list-section or in the Sections/Rules Info window in the UI.
See the --query-metrics and --query-metrics-mode command line options in https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options-profile for how to query individual metric names.