Perfworks and Metrics Detail

I am trying to process “nsight-cuprof-report” file and extract metric from it directly, but I have some problems. I understand that the Perfwork API is being used in Nsight Compute, and if I pass the metrics using --metrics, regardless of the order the output file has certain order under Section tag and which can be used to extract values from Metrics tags (starting from 157)!

However if I use “.section” file to specify the metrics, the order in the Section tag in Nsight Compute output files is based on section file, which is inconsistent with NameID of Metrics tag. Starting from 157, with certain order which I couldn’t find yet.

I was hoping to use the “–query-metrics” output order to solve this problem and I encountered another problem, some metrics (such as “l1tex__t_sector_hit_rate”) which are available in the documentation and default section files are not shown in “–query-metrics” for my GPU (Titan RTX).

I would appreciate any hint or solution to these.

You can refer the “Report File Format” section in the Nsight Compute Customization Guide:
https://docs.nvidia.com/nsight-compute/CustomizationGuide/index.html#report-file-format

Also you can consider using the --csv option to get comma separated output - in case that is easier for you to process.

Thanks! I can (almost) successfully decode the report file, but I cannot figure out the payload types, so I just use “ProfileResult” to decode, it decodes all metrics data I want, but large portion of report file cannot be decoded using this. I tried all other proto formats, no luck yet. For example the first payload is around 90% of report file and I could not decode it!

I am guessing they are sorted alphabetically, however that’s my guess and it seems correct for now. But if I pass an invalid metrics, “nv-nsight-cu” (the GUI) can identify it and will show yellow triangle, however I couldn’t find where it is stored in the report file. So one wrong metric and my code will produce wrong values. For now I can say that metrics up to 156 and last 10 are reserved, but this might be wrong!

Also, thanks for the csv suggestion, but I need nvtx data and it’s not supported in csv form.

The payloads per block are in the number and order as described by the BlockHeader structure, i.e. a BlockHeader with

NumSources=1,NumResults=2,SessionDetails=null,StringTable=<data>,PayloadSize=<size>,Process=<data>

starts with one payload of type SourceData, followed by two payloads of type ProfileResult, all three having a combined size of .
SourceData records are described in the ProfilerCommon.proto file. They contain the CUDA modules of your kernels, which is why they can make up considerable parts of the report.

Metrics are not stored necessarily alphabetically. Each metric is assigned a unique numerical ID, depending on the order they are encountered during processing. The IDs can be resolved to metric names using the StringTable entries in the BlockHeaders. StringTables can be split across multiple blocks, but the joined table up to block N is guaranteed to contain the IDs for all metrics encountered in this and all preceding blocks.