Methodology for the choice of metrics for Nsight Compute Sections?

ccasey · February 4, 2025, 5:33pm

I’ve noticed that profiling is much faster when I stay within a single section when looking for metrics. I wanted to know, is this due to hardware constraints or was Nsight Compute programmed in such a way that optimizes the specific metrics found within sections? If I was able to profile x-amount of metrics from the same section on Nsight Compute, and then did the same with CUPTI (if the same metrics were there), would the amount of kernel replays needed remain the same?

felix_dt · February 4, 2025, 6:25pm

Metrics come from different providers, and even when from the same provider, often not all metrics can be collected at the same time due to HW limitations, yes. You can find more detail about this here and about overhead here.

If I was able to profile x-amount of metrics from the same section on Nsight Compute, and then did the same with CUPTI (if the same metrics were there), would the amount of kernel replays needed remain the same?

If you choose the same specific metrics, you should end up with the same number of passes, yes. It’s not related to the number of metrics in general, but which ones are chosen. Unfortunately, there is no good rule of thumb to determine this offline for a specific generic set of metrics unless they are actually scheduled on the concrete chip (there are some groups of metrics which are known to be collectable in a single pass, though).

For certain types of data collection, like pm and warp state sampling, ncu determines the number of passes dynamically beyond the minimum required number, to find optimal sampling parameters. This can differ from running this in CUPTI. If the sampling parameters are all set by the user, the minimal number of passes should be replayed by the tool.

Topic		Replies	Views
Is not there a replay-mode option? Nsight Compute	1	900	July 24, 2019
Metrics Reference for 7.5 Nsight Compute	1	589	January 28, 2019
NVIDIA Nsight Compute to profile the whole application Nsight Compute	4	708	May 26, 2021
Profiling one application having two concurent kernels Nsight Compute	3	778	June 8, 2023
Which foundational libraries do the high-frequency GPU metrics in Nsight Systems come from? CUPTI – CUDA Profiler Tools Interface cuda , nsight , profiling , performance-metrics	4	574	March 14, 2024
Qusetion about kernel warmup and replay control Nsight Compute	1	255	March 21, 2025
Failed to access the following 9 metrics Nsight Compute	1	495	March 12, 2024
Ncu-ui not profiling some sections Nsight Compute	4	2553	November 26, 2020
Can't Get NCU GUI To Import Properly Nsight Compute	8	1601	October 5, 2020
Too long runtime with ncu Nsight Compute	1	1416	June 24, 2022

Methodology for the choice of metrics for Nsight Compute Sections?

Related topics