Failed to access the following 9 metrics

I want to use range replay of ncu to profile a range of kernels via self-defined section file. I use cuProfilerStart and cudaProfilerEnd to define a profiling range. Afterward, I use the following command to start profilling.

ncu --section-folder /root/dissertation-project/sections/ --section BasicSection --replay-mode range vectorAdd

The section content is as follow:

Identifier: "BasicSection"
DisplayName: "Basic Section"
Description: "This section collects metrics related to hierarchical arithmetic intensity."

Header {
 Metrics {
   Label: "# of half-precision add instruction"
   Name: "sm__sass_thread_inst_executed_op_hadd_pred_on.sum"
 }
 Metrics {
   Label: "# of half-precision multiple-add instruction"
   Name: "sm__sass_thread_inst_executed_op_hfma_pred_on.sum"
 }
 Metrics {
   Label: "# of half-precision multiple instruction"
   Name: "sm__sass_thread_inst_executed_op_hmul_pred_on.sum"
 }

 Metrics {
   Label: "# of single-precision add instruction"
   Name: "sm__sass_thread_inst_executed_op_fadd_pred_on.sum"
 }
 Metrics {
   Label: "# of single-precision multiple-add instruction"
   Name: "sm__sass_thread_inst_executed_op_ffma_pred_on.sum"
 }
 Metrics {
   Label: "# of single-precision multiple instruction"
   Name: "sm__sass_thread_inst_executed_op_fmul_pred_on.sum"
 }
 Metrics {
   Label: "# of double-precision add instruction"
   Name: "sm__sass_thread_inst_executed_op_dadd_pred_on.sum"
 }
 Metrics {
   Label: "# of double-precision multiple-add instruction"
   Name: "sm__sass_thread_inst_executed_op_dfma_pred_on.sum"
 }
 Metrics {
   Label: "# of double-precision multiple instruction"
   Name: "sm__sass_thread_inst_executed_op_dmul_pred_on.sum"
 }

 Metrics {
   Label: "DRAM"
   Name: "dram__bytes.sum"
 }
 Metrics {
   Label: "L2"
   Name: "lts__t_bytes.sum"
 }
 Metrics {
   Label: "L1"
   Name: "l1tex__t_bytes.sum"
 }

}

After executing the ncu command, it prints:

==WARNING== Please consult the documentation for current range replay limitations and requirements.
==PROF== Connected to process 14053 (/root/vectorAdd)
==ERROR== Failed to access the following 9 metrics: sm__sass_thread_inst_executed_op_dadd_pred_on.sum, sm__sass_thread_inst_executed_op_dfma_pred_on.sum, sm__sass_thread_inst_executed_op_dmul_pred_on.sum, sm__sass_thread_inst_executed_op_fadd_pred_on.sum, sm__sass_thread_inst_executed_op_ffma_pred_on.sum, sm__sass_thread_inst_executed_op_fmul_pred_on.sum, sm__sass_thread_inst_executed_op_hadd_pred_on.sum, sm__sass_thread_inst_executed_op_hfma_pred_on.sum, sm__sass_thread_inst_executed_op_hmul_pred_on.sum

==ERROR== Failed to profile kernel "range" in process 14053
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No ranges were profiled.
==WARNING== Profiling ranges launched by child processes requires the --target-processes all option.

Actually, my purpose is to profile the metrics in the self-defined section file of a range of kernels. How should I do to achieve to it?

Hi, @viruxzheng

This is expected. Please see https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html?highlight=range%20replay#compatibility

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.