Is (nvprof metrics equivalent) CLI interface for printing result exists?

sakaia · May 17, 2019, 5:48am

1)Question about nvprof metrics

I want to execute following equivalent code.

nvprof --kernels “smooth_kernel” --metrics flop_count_dp --metrics dram_read_throughput --metrics dram_write_throughput --metrics dram_read_transactions --metrics dram_write_transactions ./build/bin/hpgmg-fv 6 8

What to do for /usr/local/cuda/NsightCompute-1.0/nv-nsight-cu-cli ?
Of course, GUI tool seems exist, but I want to do it on CLI.

Ref.
Measuring Roofline Quantities on NVIDIA GPUs
Collecting Roofline on GPUs - Performance Portability

2)CLI tracer for Nsight?
From seeing “5. Nvprof Transition Guide”
The tracing should execute by Nsight System.
Is the CLI command exists? (not GUI tool)

Ref.
5. Nvprof Transition Guide

Sanjiv.Satoor · May 17, 2019, 12:35pm

Regarding point (2) - CLI tracer for Nsight

Yes Nsight Systems has a CLI. Please refer the document [url]https://docs.nvidia.com/nsight-systems/index.html#nsight_systems/2019.3.1-x86/06-cli-profiling.htm%3FTocPath%3D_____6[/url]

felix_dt · May 21, 2019, 9:04am

I want to execute following equivalent code.

nvprof --kernels "smooth_kernel" --metrics flop_count_dp --metrics dram_read_throughput --metrics dram_write_throughput --metrics dram_read_transactions --metrics dram_write_transactions ./build/bin/hpgmg-fv 6 8 

What to do for /usr/local/cuda/NsightCompute-1.0/nv-nsight-cu-cli ?

Can you please clarify what is not working for you here? You mentioned yourself the Nsight Compute/Nvprof transition guide, which should contain all the details on how to select kernels and metrics for profiling and how to configure output on the CLI. If there are additional questions, feel free to ask.

sakaia · May 21, 2019, 9:19am

Thank you, it seems I should use nsys not nv-nsight-cu-cli.
I will digging it.

sakaia · May 22, 2019, 6:40am

In Tesla T4, nvprof’s flop_count_dp equivalent counter is not available for nv-nsight-cu-cli?
In specific following flags, It shows n/a

smsp__sass_thread_inst_executed_op_dadd_pred_on.sum
smsp__sass_thread_inst_executed_op_dfma_pred_on.sum
smsp__sass_thread_inst_executed_op_fadd_pred_on.sum

I just refer the metric comparison.

If you do on google colab, You can reproduce it with following instruction.

!git clone GitHub - pytorch/examples: A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
!cd examples/cpp/mnist;cmake -DCMAKE_PREFIX_PATH=/usr/local/lib/python3.6/dist-packages/torch/lib/ -DTorch_DIR=/usr/local/lib/python3.6/dist-packages/torch/share/cmake/Torch/ ; make
!cd examples/cpp/mnist;/usr/local/cuda/NsightCompute-1.0/nv-nsight-cu-cli --launch-count 30 --metrics dram__bytes_write.sum.per_second,smsp__sass_thread_inst_executed_op_dadd_pred_on.sum,smsp__sass_thread_inst_executed_op_dfma_pred_on.sum,smsp__sass_thread_inst_executed_op_fadd_pred_on.sum,dram__bytes_read.sum.per_second ./mnist

felix_dt · May 23, 2019, 7:08am

In Tesla T4, nvprof's flop_count_dp equivalent counter is not available for nv-nsight-cu-cli?
In specific following flags, It shows n/a

smsp__sass_thread_inst_executed_op_dadd_pred_on.sum
smsp__sass_thread_inst_executed_op_dfma_pred_on.sum
smsp__sass_thread_inst_executed_op_fadd_pred_on.sum

Those metrics were enabled in our measurement library and correctly added to the documentation, but we missed actually enabling this feature of the measurement library in the tool. We will fix this soon in a future release.

In the meantime, you might be able to use the “Executed Instruction Mix” chart of the Instruction Statistics (InstructionStats) section as a workaround. You can collect this section either on the command line or in the UI, but the chart can only be viewed in the UI. When using the command line, the section should be collected by default, otherwise you can enable it using --section InstructionStats.

daverobe · May 31, 2019, 4:47pm

Thanks for addressing this issue. Please add support for reading these statistics from the command line profiler ASAP, as I imagine they’re critical to a lot of standard workload analysis.

Topic		Replies	Views
How do i get some of the nvprof metrics in insight? Nsight Compute	0	765	June 2, 2021
Can't Get NCU GUI To Import Properly Nsight Compute	8	1479	October 5, 2020
Nvprof metrics in nsight? Nsight Compute	1	914	June 3, 2021
Command line tools for Visual Studios edition of Nsight? Nsight Compute	2	590	October 12, 2021
How can I use ncu to get kernel runtime like use "nvprof --print-gpu-trace" Nsight Compute	6	1250	October 12, 2021
Get Nvprof-like information by Nsight Nsight Compute	6	640	June 27, 2023
nv-nsight-cu-cli profiles every kernel 47x, is very slow Profiling Linux Targets	2	1156	October 12, 2021
Nsight and nvprof results have large differences Nsight Compute	9	1281	November 26, 2019
nvprof --metrics branch_efficiency..... Why no metrics ? Visual Profiler and nvprof	3	1763	December 14, 2019
Drive PX2 CPU profiling with NSIGHT DRIVE - Linux	2	727	October 12, 2021

Is (nvprof metrics equivalent) CLI interface for printing result exists?

Related topics