Is (nvprof metrics equivalent) CLI interface for printing result exists?

1)Question about nvprof metrics

I want to execute following equivalent code.

nvprof --kernels “smooth_kernel” --metrics flop_count_dp --metrics dram_read_throughput --metrics dram_write_throughput --metrics dram_read_transactions --metrics dram_write_transactions ./build/bin/hpgmg-fv 6 8

What to do for /usr/local/cuda/NsightCompute-1.0/nv-nsight-cu-cli ?
Of course, GUI tool seems exist, but I want to do it on CLI.

Ref.
Measuring Roofline Quantities on NVIDIA GPUs
https://performanceportability.org/perfport/measurements/gpu/

2)CLI tracer for Nsight?
From seeing “5. Nvprof Transition Guide”
The tracing should execute by Nsight System.
Is the CLI command exists? (not GUI tool)

Ref.
5. Nvprof Transition Guide
https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvprof-guide

Regarding point (2) - CLI tracer for Nsight

Yes Nsight Systems has a CLI. Please refer the document https://docs.nvidia.com/nsight-systems/index.html#nsight_systems/2019.3.1-x86/06-cli-profiling.htm%3FTocPath%3D_____6

I want to execute following equivalent code.

nvprof --kernels "smooth_kernel" --metrics flop_count_dp --metrics dram_read_throughput --metrics dram_write_throughput --metrics dram_read_transactions --metrics dram_write_transactions ./build/bin/hpgmg-fv 6 8 

What to do for /usr/local/cuda/NsightCompute-1.0/nv-nsight-cu-cli ?

Can you please clarify what is not working for you here? You mentioned yourself the Nsight Compute/Nvprof transition guide, which should contain all the details on how to select kernels and metrics for profiling and how to configure output on the CLI. If there are additional questions, feel free to ask.

https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvprof-guide

Thank you, it seems I should use nsys not nv-nsight-cu-cli.
I will digging it.

In Tesla T4, nvprof’s flop​_count​_dp equivalent counter is not available for nv-nsight-cu-cli?
In specific following flags, It shows n/a

smsp__sass_thread_inst_executed_op_dadd_pred_on.sum
smsp__sass_thread_inst_executed_op_dfma_pred_on.sum
smsp__sass_thread_inst_executed_op_fadd_pred_on.sum

I just refer the metric comparison.
https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvprof-metric-comparison

If you do on google colab, You can reproduce it with following instruction.

!git clone http://github.com/pytorch/examples
!cd examples/cpp/mnist;cmake -DCMAKE_PREFIX_PATH=/usr/local/lib/python3.6/dist-packages/torch/lib/ -DTorch_DIR=/usr/local/lib/python3.6/dist-packages/torch/share/cmake/Torch/ ; make
!cd examples/cpp/mnist;/usr/local/cuda/NsightCompute-1.0/nv-nsight-cu-cli --launch-count 30 --metrics dram__bytes_write.sum.per_second,smsp__sass_thread_inst_executed_op_dadd_pred_on.sum,smsp__sass_thread_inst_executed_op_dfma_pred_on.sum,smsp__sass_thread_inst_executed_op_fadd_pred_on.sum,dram__bytes_read.sum.per_second ./mnist

In Tesla T4, nvprof's flop​_count​_dp equivalent counter is not available for nv-nsight-cu-cli?
In specific following flags, It shows n/a

smsp__sass_thread_inst_executed_op_dadd_pred_on.sum
smsp__sass_thread_inst_executed_op_dfma_pred_on.sum
smsp__sass_thread_inst_executed_op_fadd_pred_on.sum

Those metrics were enabled in our measurement library and correctly added to the documentation, but we missed actually enabling this feature of the measurement library in the tool. We will fix this soon in a future release.

In the meantime, you might be able to use the “Executed Instruction Mix” chart of the Instruction Statistics (InstructionStats) section as a workaround. You can collect this section either on the command line or in the UI, but the chart can only be viewed in the UI. When using the command line, the section should be collected by default, otherwise you can enable it using --section InstructionStats.

Thanks for addressing this issue. Please add support for reading these statistics from the command line profiler ASAP, as I imagine they’re critical to a lot of standard workload analysis.