How can I profile both kernel and cuda APIs hardware usage and application total duration

gju06051 · March 9, 2024, 2:14pm

I want to measure duration of application by using nsight system or nsight compute not just single kernel but total application workload. I already knew that NCU is just for single kernel and also serialize kernels in application. So how can I get exact time of application?

Also curious about there are method that nsight compute profile CUDA API’s hardware usage not just a kernel?
For example, there is any method that profile hardware usage like DRAM read or write that cudamemcpy API call?

gju06051 · March 9, 2024, 2:18pm

I also want that ‘range replay’ can measure total duration or cuda API’s HW usage.
If then, If I want profile total performance or metrics about application, just insert top of code with cudaprofilestart () and bottom with cudaprofilerend() right?

veraj · March 12, 2024, 8:47am

Hi, @gju06051

Thanks for using our tools ! Please check if API statistics can meet your requirement 3. Nsight Compute — NsightCompute 12.5 documentation

gju06051 · March 12, 2024, 9:18am

Thank you for your comment. But why this ability not supported in Nsight Compute CLI?

And also curious about this value of API duration is same as nsight system? Your comment link say can’t replaceable of this API statistics result in Nsight Compute with Nsight System.

veraj · March 12, 2024, 10:01am

Hi, @gju06051

For API statistics, it is supported in interactive-profile. For Nsight Compute CLI, it is not supported .

Yes, as the doc said, “Note that this view cannot be used as a replacement for Nsight Systems when trying to optimize CPU performance of your application”.

For tool selection, you can refer https://developer.nvidia.com/tools-overview

veraj · March 27, 2024, 5:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is not there a replay-mode option? Nsight Compute	1	789	July 24, 2019
Profile counters for a duration Nsight Compute	1	372	July 20, 2023
How can I use ncu to get kernel runtime like use "nvprof --print-gpu-trace" Nsight Compute	6	1105	October 12, 2021
Is the profiling session duration equivalent to total runtime when using Nsight Systems? Profiling Linux Targets cuda , kernel , profiling	11	439	May 6, 2024
Question about profiling nccl kernels with Nsight Compute Nsight Compute	19	4398	August 24, 2023
Profiling in a code line resolution CUDA Programming and Performance	7	7050	December 6, 2011
NVIDIA Nsight Compute to profile the whole application Nsight Compute	4	596	May 26, 2021
Nsight Compute slows down Tesla T4 processor clock during profiling Nsight Compute	5	798	October 12, 2021
Measuring profiling time Nsight Compute	2	471	June 28, 2022
nsight-compute's profiling result is different from nvprof's Nsight Compute	5	611	October 12, 2021

How can I profile both kernel and cuda APIs hardware usage and application total duration

Related topics