When profiling with NVIDIA Nsight Compute (ncu):
- Does its kernel replay mechanism inherently handle warmup, or are explicit warmup iterations still required?
- Ncu appears to determine the number of replays automatically. Is there a way to manually specify the replay count?
Answering questions in the same order,
-
NCU takes care of warmup passes, so there is no need for explicit warmup iterations.
-
No, could you clarify the use case for this? Limiting the number of replay passes won’t be effective, as it will result in inaccurate metric data. If you want to reduce replay passes, it would be more effective to trim down the list of metrics being collected. Additionally, you can use application replay mode(s) to avoid per-kernel memory save-and-restore, in favor of re-running the entire application. Depending on the use case, this can be beneficial for data collection and profiling time. For more information on replay modes, refer to the Nsight Compute Documentation.