Metrics smsp__sass_thread_inst_executed_op* returns n/a

serinatan · May 22, 2019, 11:30pm

System: Ubuntu 16.04
Driver: 418.56 (installed through apt-get PPA)
CUDA toolkit: 10.1.105 with NSIGHT COMPUTE 2019.3
Benchmark: VectorAdd in CUDA samples

When I ran nv-nsight-cu-cli --query-metrics, I was able to see metrics in the form of smsp__sass_thread_inst_executed_op_*. However, I tried capturing those metrics nv-nsight-cu-cli --metrics <smsp...> ./vectorAdd, the profiler gives “(!)n/a”. When I tried running without --metrics or with predefined section files in the Nsight package, I was able to see other performance counters with numerical results printed.

Are those metrics smsp__sass_thread_inst_executed_op* actually available in Nsight Compute?

Thanks!

Sanjiv.Satoor · May 23, 2019, 4:59am

Which GPU are you using?

serinatan · May 23, 2019, 5:30am

Thanks for your prompt reply! It’s GeForce RTX 2070.

felix_dt · May 23, 2019, 7:10am

This issue was mentioned in another user’s post already, so I post the same answer here:

Those metrics were enabled in our measurement library and correctly added to the documentation, but we missed actually enabling this feature of the measurement library in the tool. We will fix this soon in a future release.

In the meantime, you might be able to use the “Executed Instruction Mix” chart of the Instruction Statistics (InstructionStats) section as a workaround. You can collect this section either on the command line or in the UI, but the chart can only be viewed in the UI. When using the command line, the section should be collected by default, otherwise you can enable it using --section InstructionStats.

serinatan · May 24, 2019, 2:56pm

Thanks for your reply!

Would you be possibly able to tell what is the rough timeline these metrics will get integrated?

Regarding the executed instruction mix chart, is sass__inst_executed_per_opcode some magic metric that is only for drawing the charts in Nsight Compute? Is it possible to dump the raw data in the command line as well?

daverobe · May 29, 2019, 4:30pm

I’ll add that these metrics are very important and I would also like to see them implemented in nv-nsight-cu-cli as soon as possible (especially since the equivalents in nvprof don’t work for the latest GPUs).

vinaykmehta · June 11, 2019, 5:10pm

I’ve found that by collecting the instructions executed per pipe, the sum is
very close across a range of kernels to the total instructions executed. This is how I’ve been getting around not having the actual breakdown in the CLI from sass__inst_executed_per_opcode

Metrics {
    Label: "Executed Instructions - Pipeline ADU"
    Name: "sm__inst_executed_pipe_adu.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline ALU"
    Name: "sm__inst_executed_pipe_alu.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline CBU"
    Name: "sm__inst_executed_pipe_cbu.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline FMA"
    Name: "sm__inst_executed_pipe_fma.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline FP16"
    Name: "sm__inst_executed_pipe_fp16.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline FP64"
    Name: "sm__inst_executed_pipe_fp64.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline IPA"
    Name: "sm__inst_executed_pipe_ipa.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline LSU"
    Name: "sm__inst_executed_pipe_lsu.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline Tensor"
    Name: "sm__inst_executed_pipe_tensor.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline TensorOpHMMA"
    Name: "sm__inst_executed_pipe_tensor_op_hmma.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline TensorOpIMMA"
    Name: "sm__inst_executed_pipe_tensor_op_imma.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline TEX"
    Name: "sm__inst_executed_pipe_tex.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline Uniform"
    Name: "sm__inst_executed_pipe_uniform.sum"
  }
  Metrics {
    Label: "Executed Instructions - Pipeline XU"
    Name: "sm__inst_executed_pipe_xu.sum"
  }

Edit: It would be useful to know which instructions correspond to which pipelines. Some are clear, but some like ‘Uniform’ and ‘XU’ are not as clear.

Pyromuffin · July 11, 2019, 8:42pm

An explanation of these metrics would also be helpful to me.

ronny_dt · August 2, 2019, 9:33am

This source has a list of instructions per path/pipeline: CUDA Binary Utilities :: CUDA Toolkit Documentation

It has a list for the uniform datapath.
Many of the control instructions should map to the CBU (branch unit)

I can’t vouch for correctness, but https://arxiv.org/pdf/1903.07486.pdf (3.5.2) describes the uniform datapath.

I believe XU is a new/different name for what was the “special-function unit” (SFU) or “multi-function unit” (MUFU - an instruction).

Topic		Replies	Views
N/A instructions executed Nsight Compute	1	372	September 17, 2020
How to get every instruction num from nv-nsight-cu-cli command-line Nsight Compute	5	1553	October 12, 2021
n/a for metrics Nsight Compute	8	1745	December 26, 2019
Nv-nsight-cu-cli --metrics gpu__time_active ./program show n/a data Nsight Compute cuda	2	936	October 12, 2021
How do i get some of the nvprof metrics in insight? Nsight Compute	0	770	June 2, 2021
Where can i find detail information of all the metrics and concept in the Nsight Compute? CUDA Programming and Performance	2	407	August 31, 2022
Nvprof metrics in nsight? Nsight Compute	1	928	June 3, 2021
Why get all metrics with "n/a" in Nsight? Nsight Compute	5	1239	June 6, 2019
Nsight event collection Nsight Compute	2	400	October 12, 2021
Using Nsight Compute to Inspect your Kernels Technical Blog	2	1785	August 31, 2020

Metrics smsp__sass_thread_inst_executed_op* returns n/a

Related topics