CUDA Perfomance Profiling with Nvidia NSight in VS2010 - .nvreport report file

cuder_joe · April 29, 2013, 9:05pm

I did a trace of application

In this report file:

When I select “CUDA → CUDA Summary” in the drop down

Under the Runtime API calls item in the table

% Time - 80.66

Launches

% Device Time - 15.46

All the other time percentages are nearly 0%

so my question here is that where is the rest of the 19.34% of Time and 84.54% of Device Time? That is, if they mean percentage to completely different ‘Total Time’ values?

I used thrust vectors to copy back and forth my data. In the “Memory Copy” section of this report, all the % Time values for memo copy for my run are apparently negligible.

But guess what, when I click the ‘summary’ link of the Runtime API Calls (which has its % Time value as high as 80.66), I immediately see that the culprit - ‘cudaMemcpy’ with its ‘Capture Time %’ value as high as 73.75 in this ‘Runtime API Calls Summary’ page.

so my question here is that

does this mean that my bottle neck are still those call to thrust::copy(), even the “Memo Copies” section of the report doesn’t show it?
and how can I really find the exact function call that is the most expensive to me in general?
how does timeline feature help with any of these?

Greg · April 30, 2013, 9:41pm

See visual studio 2010 - CUDA Perfomance Profiling with Nvidia NSight in VS2010 - .nvreport report file - Stack Overflow for a detailed answer.

Topic		Replies	Views
nvprof CUDA Programming and Performance	1	1093	May 7, 2014
results of tracing with nsight CUDA Programming and Performance	0	551	January 10, 2015
Analysis of CUDA Visual Profiler Output CUDA Programming and Performance	2	1880	October 6, 2008
nvprof and difference in time reported CUDA Programming and Performance	4	1130	September 16, 2017
what does Visual Profiler mean regarding this analysis result? Kernel time+Memory copy time does not CUDA Programming and Performance	1	519	January 2, 2012
help me understanding the report of Profiler about reading the Profiler report CUDA Programming and Performance	1	1063	December 23, 2008
How to explain the performance difference? CUDA Programming and Performance	7	3532	March 26, 2008
Time of API calls in nvprof's output is consumed in GPU or CPU Jetson TX2	2	585	October 18, 2021
How to Understand output of nvprof? CUDA Programming and Performance	1	2529	June 3, 2015
What do you understand by CPU time? CPU time, computational load, cuda prof CUDA Programming and Performance	8	2428	July 11, 2008

CUDA Perfomance Profiling with Nvidia NSight in VS2010 - .nvreport report file

Related topics