A couple basic questions about metric definitions/meaning

Maybe I’m missing it in the docs somewhere, but can someone explain the meaning of the “pred_on” in instruction execution metrics such as:
smsp__sass_thread_inst_executed_op_fadd_pred_on.sum
smsp__sass_thread_inst_executed_op_integer_pred_on.sum
I see --metric-query provides “# of FADD thread instructions executed where all predicates were true”,
but what exactly comprises “all predicates” and when might some of the predicates not be true?

Second metrics question:
what are the pcie_read/write_bytes.sum metrics? They dont appear to reflect amount of data xfer from host to device, since I’ve measured a simple vectorAdd kernel with known input (2 float vectors of len=N) and one output float vector len=N, and the pcie metric values are totally different. What do they mean?

Thanks, C

CUDA assembly instructions (SASS) use per-thread predicate registers to determine if an instruction should be executed or not, if that instruction has an assigned predicate. Here is an example where the execution of the EXIT instruction is determined by predicate register P0, which itself is computed by the two preceding instructions:

ISETP.GT.AND P0, PT, R3, R2, PT
ISETP.GT.OR P0, PT, R0, R5, P0
@P0 EXIT

“all predicates are true” refers to the case when all active threads in the warp have their predicate set to true. Inspecting “Instructions Executed” and “Predicated-On Thread Instructions Executed” on the UI’s Source page can help to understand this concept. Nsight Compute :: Nsight Compute Documentation

1 Like

With respect to the PCIE metrics, those are collected per-kernel (as are all metrics in Nsight Compute). This means that you would only see non-zero values here if the kernel would be accessing pinned memory mapped to the device during its runtime. If the kernel accesses “regular” device memory which has been transferred from the host to the device beforehand using e.g. a cudaMemcpy call, it would not be measured by this metric.

If you have further questions, it would be good to share the exact source code, Nsight Compute version and command, OS, GPU, and driver version, as those might be required to analyze the problem.

1 Like

@felix_dt - great answers, thank you! Unfortunately I can’t seem to mark both answers as a Solution.

these are exactly what I needed to know.