Maybe I’m missing it in the docs somewhere, but can someone explain the meaning of the “pred_on” in instruction execution metrics such as:
I see --metric-query provides “# of FADD thread instructions executed where all predicates were true”,
but what exactly comprises “all predicates” and when might some of the predicates not be true?
Second metrics question:
what are the pcie_read/write_bytes.sum metrics? They dont appear to reflect amount of data xfer from host to device, since I’ve measured a simple vectorAdd kernel with known input (2 float vectors of len=N) and one output float vector len=N, and the pcie metric values are totally different. What do they mean?