I have some questions about the PM sampling feature in the latest version of ncu, so I would like to understand its specific meaning. Could you please provide clarification?
- I would like to confirm the valid range of the pm-sampling-interval, as it is not explicitly mentioned on the official website. Could you please provide this information?
2.I am not clear about the meaning of ‘pass group’ and ‘pass group X active’ in PM sampling. Could you please provide an explanation?
I’m not sure about the interval. I’ll do some digging and get back to you.
With respect to the pass groups, not all metrics can be collected at the same time so they are grouped into multiple passes. Pass groups are define which metrics are collected together. If you hover over another row, it should tell you what group it was a member of during the collection. At the end, all groups are then composed onto the same timeline.
Thank you very much, your response has been very helpful in helping me understand the concept of ‘pass group’.
Please allow me to ask two more questions.
3.I don’t quite understand the meaning of ‘context switch trace’ and how it helps us analyze specific issues in the pass group. Could you please provide an explanation?
4.Does PM sampling support collection under MIG and Virtual Function in ncu?
I don’t quite understand the meaning of ‘context switch trace’ and how it helps us analyze specific issues in the pass group.
Context switch trace is explained in the documentation. The purpose is to align the data sampled across multiple passes and to filter it to only the CUDA context that is being profiled.
Does PM sampling support collection under MIG and Virtual Function in ncu?
PM sampling is not supported on vGPU (assuming that’s what you mean by “virtual function”). It is supported on MIG (but context switch trace is not supported for it, making the collected data slightly harder to interpret).
I would like to confirm the valid range of the pm-sampling-interval
The minimal interval for sampling depends on the GPU architecture. For Turing and GA100, it is 20000 cycles. For GA10x and newer, it is 1000ns.
Thank you for your response, it has been very helpful for me to understand PM sampling.