Get the total L2 Cache data volume during the complete execution of a CUDA Program

wuliJerry · April 8, 2023, 7:32am

My CUDA version is 12.1, while the Nsight version is 2023.1.0.0. I only found a way to profile specific kernels within specific invocations. So I was wondering how to profile the whole execution without the detailed profiled information on each kernel, which should be faster and more disk-saving.

jmarusarz · April 10, 2023, 9:06pm

Nsight Compute is designed to profile individual instances of running kernels to identify performance issues within them. Due to limitations on how metrics are collected, this usually requires saving and restoring the state before the kernel was launched, so it can be replayed many times. For this reason, and others, profiling the entire application and aggregating the data is not a directly supported feature.

There is the option to use Range Replay, which can aggregate data for multiple kernels in a range. In theory, if your application was small enough to store all the state changes for replay, you may be able to create a range around the entire thing, but that’s not explicitly what Range Replay was designed for.

The results are also available from the CLI and other formats, so you could do some manual aggregation at the end using scripts/spreadsheets etc…

Topic		Replies	Views
Is not there a replay-mode option? Nsight Compute	1	801	July 24, 2019
How can I profile both kernel and cuda APIs hardware usage and application total duration Nsight Compute	5	422	March 27, 2024
Profiling one application having two concurent kernels Nsight Compute	3	604	June 8, 2023
NVIDIA Nsight Compute to profile the whole application Nsight Compute	4	604	May 26, 2021
Nsight Compute getting confused between kernels? Nsight Compute	6	330	June 17, 2024
About the Nsight Compute category Nsight Compute	1	2733	October 23, 2024
Total kernel execution time Nsight Compute	2	957	December 15, 2021
How to profile a part of kernel function with Nsight Compute Nsight Compute	3	511	April 10, 2024
Inconsistent kernel execution times, and affected by Nsight Systems CUDA Programming and Performance	1	335	April 23, 2024
NSIGHT Compute hangs at profiling CUDA application Nsight Compute	1	632	July 20, 2023

Get the total L2 Cache data volume during the complete execution of a CUDA Program

Related topics