A couple basic questions about metric definitions/meaning

colinrei · February 1, 2021, 12:52am

Maybe I’m missing it in the docs somewhere, but can someone explain the meaning of the “pred_on” in instruction execution metrics such as:
smsp__sass_thread_inst_executed_op_fadd_pred_on.sum
smsp__sass_thread_inst_executed_op_integer_pred_on.sum
I see --metric-query provides “# of FADD thread instructions executed where all predicates were true”,
but what exactly comprises “all predicates” and when might some of the predicates not be true?

Second metrics question:
what are the pcie_read/write_bytes.sum metrics? They dont appear to reflect amount of data xfer from host to device, since I’ve measured a simple vectorAdd kernel with known input (2 float vectors of len=N) and one output float vector len=N, and the pcie metric values are totally different. What do they mean?

Thanks, C

felix_dt · February 1, 2021, 9:11am

CUDA assembly instructions (SASS) use per-thread predicate registers to determine if an instruction should be executed or not, if that instruction has an assigned predicate. Here is an example where the execution of the EXIT instruction is determined by predicate register P0, which itself is computed by the two preceding instructions:

ISETP.GT.AND P0, PT, R3, R2, PT
ISETP.GT.OR P0, PT, R0, R5, P0
@P0 EXIT

“all predicates are true” refers to the case when all active threads in the warp have their predicate set to true. Inspecting “Instructions Executed” and “Predicated-On Thread Instructions Executed” on the UI’s Source page can help to understand this concept. Nsight Compute :: Nsight Compute Documentation

felix_dt · February 1, 2021, 10:03am

With respect to the PCIE metrics, those are collected per-kernel (as are all metrics in Nsight Compute). This means that you would only see non-zero values here if the kernel would be accessing pinned memory mapped to the device during its runtime. If the kernel accesses “regular” device memory which has been transferred from the host to the device beforehand using e.g. a cudaMemcpy call, it would not be measured by this metric.

If you have further questions, it would be good to share the exact source code, Nsight Compute version and command, OS, GPU, and driver version, as those might be required to analyze the problem.

colinrei · February 1, 2021, 9:30pm

@felix_dt - great answers, thank you! Unfortunately I can’t seem to mark both answers as a Solution.

these are exactly what I needed to know.

Topic		Replies	Views
Inst_executed and thread_inst_executed Nsight Compute	4	1794	October 12, 2021
Metrics smsp__sass_thread_inst_executed_op* returns n/a Nsight Compute	8	1721	August 2, 2019
Difference between thread_inst_executed metrics Nsight Compute performance-metrics	6	917	July 11, 2022
Some questions about one metric Nsight Compute cuda	7	765	January 25, 2022
Definition of sass__inst_executed* Nsight Compute	4	959	January 18, 2022
Warp efficiency Nsight Compute	0	561	December 30, 2020
Which metric should I collect from ncu profiler if I want to get the IOPS (integer operations per second) for my kernel? Nsight Compute	6	624	August 24, 2023
Metric references and description Nsight Compute	7	4165	March 2, 2024
Question on metric Nsight Compute	2	465	October 12, 2021
Difference sm__cycles_elapsed/smsp__cycles_elapsed and sm__inst_executed/smsp__inst_executed? Nsight Compute performance-metrics	6	1788	February 16, 2022

A couple basic questions about metric definitions/meaning

Related topics