How to continuously monitor cudaMemcpy throughput with CUPTI?

uriv · June 7, 2013, 12:12pm

The visual profiler shows the memcpy throughput in the timeline under counters. The throughput is lower at the beginning and the end of the operation. I’d like to collect this data myself. CUPTI seems like the right tool, but I didn’t find the right events/metrics to collect.

Can anyone suggest how to collect it?

External Media

Greg · June 7, 2013, 3:20pm

uriv,

The screenshot above is from Nsight VSE CUDA Trace, not the Visual Profiler/CUPTI. I make the distinction as the two tools use different data collection mechanisms. Nsight generally collects more data and presents data in a different manner than the Visual Profiler.

The Counter row area graphs for memory copy throughput and GPU utilization are generated by performing point interpolation every 3 pixels on the range data provided in the Compute a Memory rows. The memory copy ranges in the Memory row can actually bound multiple PCIe transfers and the actual PCIe utilization/efficiency can vary during a range. The slight slope on the start and end of the range is due to the linear interpolation and not actual performance counter data. Ideally, the area graphs would change from being fixed interval interpolation to variable when the data is sparse (zoomed in) so that the edges would exactly match the ranges in the screenshot you have.

Nsight VSE, Visual Profiler, and CUPTI Events API do not expose a method to perform frequency based sampling of the PCIe performance counters. The CUPTI Activity API can be used to collect the range data displayed in the Memory, Compute, and Streams rows.

uriv · June 9, 2013, 8:18am

Greg, thanks a lot for the explanation. It was very helpful.

Topic		Replies	Views
CUDA Profiler [memcopy] weird result CUDA Programming and Performance	7	7289	November 8, 2007
CUDA 2.0 and Visual Profiler 1.0.11 cudaprof reports are missing memcopy's CUDA Programming and Performance	5	5693	August 26, 2008
cuda profiler -> cannot get performance values problem with some profiler counters being skipped CUDA Programming and Performance	0	884	March 13, 2011
Visual profiler missing information CUDA Programming and Performance	6	8440	May 26, 2009
GPU performance counters CUDA Programming and Performance	6	1956	April 3, 2013
CUDA Command Line Profier - Calculating Global Memory Throughput CUDA Programming and Performance	0	5165	May 14, 2012
Command line profiler and metrics CUDA Programming and Performance	0	703	January 30, 2012
API can measure or query values of performance counters CUDA Programming and Performance	5	1517	August 1, 2017
VisualProfiler ver 2.2 CUDA Programming and Performance	13	4894	April 10, 2009
CUDA visual profiler CUDA Programming and Performance	1	1020	May 5, 2010

How to continuously monitor cudaMemcpy throughput with CUPTI?

Related topics