Which foundational libraries do the high-frequency GPU metrics in Nsight Systems come from?

slei.chu · March 13, 2024, 6:59am

Regarding the high-frequency GPU metrics(such as GPC clock frequency/SM active and etc.) collection in Nsight Systems, which interfaces do these metrics originate from? To my knowledge, the Profiling API in CUPTI recommends dumping data at the kernel granularity, and the data from NVML seems insufficient to cover the metrics in Nsight Systems. Is it possible that Nsight Systems triggers CUPTI dumps through timers? I doubt such a high frequency can be achieved. If there are any proprietary design issues involved, please point them out. Additionally, if you have any suitable suggestions, I would appreciate your response. Thank you.

Sanjiv.Satoor · March 13, 2024, 7:30am

Your understanding is correct.

At this point CUPTI does not have support for metric collection equivalent to that available in Nsight Systems.

We are working on adding new CUPTI APIs for this feature and this will be supported in the future.

slei.chu · March 13, 2024, 7:49am

Thank you for your response. I am still curious about the method of collecting these metrics in Nsight Systems. Can it be understood that NV was using unreleased features, such as NVPW_GPU_PeriodicSampler_Get_CounterAvailabilityImage, which directly interacts with the CUDA DRIVER to obtain a image of some hardware counters and calculate these metrics and this functionality will be added to CUPTI in the future? Thank you once again for your response.

Greg · March 13, 2024, 12:15pm

CUPTI, NCU, NSYS, and NGFX use Nsight PerfSDK. See https://developer.nvidia.com/nsight-perf-sdk.

slei.chu · March 14, 2024, 1:31am

Thank you very much, all my questions have been answered, and the discussion can be closed now.