Visual Profiler: tracking of concurrent data transfers and kernel executions

ElBifi · January 20, 2011, 7:49am

Hello,

I have programed a CUDA application, which utilizes concurrent data transfer from host to device and kernel execution by using cuda streams and asynchronous memcopy. At the moment I want to track the actually achieved concurrency with the CUDA Visual Profiler but when I start a new recording, only a consecutive behaviour is shown. Nervertheless when using CPU or GPU timers a corresponding parallelization is measurable.

Now my question is whether the CUDA Visual Profiler supports the recording of concurrent data transfers and kernel executions.

Best regards,
ElBifi

fcs · January 20, 2011, 8:58am

For what I think, the profiler blocks kernel’s launchs.

You can test it with the simpleStreams sample of the SDK:

The standart behavior shows:

./simpleStreams 

[ simpleStreams ]

> > Using CUDA device [0]: Tesla T10 Processor

> CUDA Capable SM 1.3 hardware with 30 multi-processors

> scale_factor = 1.0000

> array_size   = 16777216

memcopy:        13.63

kernel:         25.15

non-streamed:   37.05 (38.78 expected)

4 streams:      26.65 (28.56 expected with compute capability 1.1 or later)

-------------------------------

PASSED

And the output in the computeprof shows:

Start program './simpleStreams' run #5 ...

[ simpleStreams ]

> > Using CUDA device [0]: Tesla T10 Processor

> CUDA Capable SM 1.3 hardware with 30 multi-processors

> scale_factor = 1.0000

> array_size   = 16777216

memcopy:	13.45

kernel:		25.18

non-streamed:	36.92 (38.63 expected)

4 streams:	38.99 (28.54 expected with compute capability 1.1 or later)

-------------------------------

PASSED

avidday · January 20, 2011, 9:38am

The profiler “decorates” execution with a lot of additional events to enable data logging and instrumentation of a program on the device. This has the effect of serializing actions that would otherwise be asynchronous. There was another thread about the perils of judging latency and concurrency just using the profiler here, if you are interested.

Topic		Replies	Views
Stream serialization with CUDA Visual Profiler v2.3.11 CUDA Programming and Performance	4	10182	November 3, 2009
Problems with Streams Very strange!!! CUDA Programming and Performance	1	7675	November 26, 2009
No performance improvement using CUDA stream DRIVE AGX Xavier General driveos-cuda	11	1752	March 22, 2022
Visual Profiler and Streams concurrency CUDA Programming and Performance	2	672	June 19, 2018
CUDA Visual Profiler: Not showing overlapping memory copies CUDA Programming and Performance	0	5873	March 11, 2010
Optimisation using Visual profiler Some guess I would like to discuss with you CUDA Programming and Performance	5	1716	April 10, 2012
CUDA stream performance CUDA Programming and Performance	5	2468	July 23, 2013
Multiple CPU threads with multiple cudaStreams CUDA Programming and Performance	5	6415	July 23, 2015
streams strange behaviour with profiler CUDA Programming and Performance	0	548	November 25, 2014
concurrent copy and execution not showing in visual profiler CUDA Programming and Performance	0	3619	July 22, 2009

Visual Profiler: tracking of concurrent data transfers and kernel executions

Related topics