Nvprof metrics in nsight?

guyen800 · June 2, 2021, 5:06pm

I have pretty old cuda book released around 2014 which focuses on Fermi and Kepler named “Professional Cuda C programming”. Lot of examples mention about nvprof but with my system (rtx2070) with compute capability 7.5, nvprof no longer appears to be supported.
For example, tried branch efficienty metric:
nvprof --metrics branch_efficiency ./a.out 256 33554432
======== Warning: Skipping profiling on device 0 since profiling is not supported on devices with compute capability 7.5 and higher.
Use NVIDIA Nsight Compute for GPU profiling and NVIDIA Nsight Systems for GPU tracing and CPU sampling.
Refer NVIDIA Developer Tools Overview | NVIDIA Developer for more details.

Now I installed the nsight and tried command line vesrion for similar metrics but does not appear to be finding anything. Any ideas?

root@nonroot-MS-7B22:/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming# nv-nsight-cu-cli --list-metrics | grep -i branch
root@nonroot-MS-7B22:/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming# nv-nsight-cu-cli --list-metrics
sm__warps_active.avg.per_cycle_active
sm__warps_active.avg.pct_of_peak_sustained_active
sm__throughput.avg.pct_of_peak_sustained_elapsed
sm__maximum_warps_per_active_cycle_pct
sm__maximum_warps_avg_per_active_cycle
sm__cycles_active.avg
lts__throughput.avg.pct_of_peak_sustained_elapsed
launch__waves_per_multiprocessor
launch__thread_count
launch__shared_mem_per_block_static
launch__shared_mem_per_block_dynamic
launch__shared_mem_per_block_driver
launch__shared_mem_per_block
launch__shared_mem_config_size
launch__registers_per_thread
launch__occupancy_per_shared_mem_size
launch__occupancy_per_register_count
launch__occupancy_per_block_size
launch__occupancy_limit_warps
launch__occupancy_limit_shared_mem
launch__occupancy_limit_registers
launch__occupancy_limit_blocks
launch__grid_size
launch__func_cache_config
launch__block_size
l1tex__throughput.avg.pct_of_peak_sustained_active
gpu__time_duration.sum
gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed
-arch:75:86:gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed
-arch:40:70:gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed
gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed
gpc__cycles_elapsed.max
gpc__cycles_elapsed.avg.per_second
dram__cycles_elapsed.avg.per_second
-arch:75:86:dram__cycles_elapsed.avg.per_second
-arch:40:70:dram__cycles_elapsed.avg.per_second
breakdown:sm__throughput.avg.pct_of_peak_sustained_elapsed
breakdown:gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed
root@nonroot-MS-7B22:/git.co/dev-learn/gpu/cuda/linux/cuda-c-programming#

I can get print-summary output but it outputs far more than necessary and not finding the specific one metric I was looking for, mentioned above

Sanjiv.Satoor · June 3, 2021, 6:19am

Note that the metric names in Nsight Compute are different from those in nvprof. Please refer the Metric Comparison sub-section in the Nvprof Transition Guide section of the Nsight Compute CLI document.

Topic		Replies	Views
How do i get some of the nvprof metrics in insight? Nsight Compute	0	773	June 2, 2021
nvprof --metrics branch_efficiency..... Why no metrics ? Visual Profiler and nvprof	3	1780	December 14, 2019
How to get nvprof equivalent of nvprof metrics --query-metrics Nsight Compute	5	278	December 11, 2024
Is (nvprof metrics equivalent) CLI interface for printing result exists? Nsight Compute	6	819	May 31, 2019
Nv-nsight-cu-cli --metrics gpu__time_active ./program show n/a data Nsight Compute cuda	2	938	October 12, 2021
Can't Get NCU GUI To Import Properly Nsight Compute	8	1545	October 5, 2020
Profiling for 7.5 CUDA Programming and Performance	3	1585	August 5, 2019
Seeing n/a for metrics Nsight Compute	1	616	September 23, 2019
n/a for metrics Nsight Compute	8	1757	December 26, 2019
Get Nvprof-like information by Nsight Nsight Compute	6	685	June 27, 2023

Nvprof metrics in nsight?

Related topics