concurrent copy and execution not showing in visual profiler

Nico · July 22, 2009, 1:23pm

Hi all,

I’ve got some cuda code that I wanted to speed up by overlapping asynchronous device2host copies with kernel execution using page-locked memory. When inspecting the GPU-time width plot, I do not see the overlap (see attached figure).
When I put the copy on the same stream as the kernels, execution times are 4.2 ms per iteration. When I put it in a different stream, execution time drops to 2.95 ms per iteration, so it works.
In both cases, however, the GPU time width plot looks exactly the same. Even when I put the copy in the same stream as the kernels, it appears in Stream_0 in the plot.

Is this a bug? Or am I interpreting the profiler incorrectly?

N.

Topic		Replies	Views
CUDA Visual Profiler: Not showing overlapping memory copies CUDA Programming and Performance	0	5877	March 11, 2010
streams strange behaviour with profiler CUDA Programming and Performance	0	555	November 25, 2014
Concurrent execution problem Try to understand how to achieve the data and execution concurrency CUDA Programming and Performance	4	1589	July 9, 2010
GPU Time Width Plot in CUDA Visual Profiler / Asyn mem copy CUDA Programming and Performance	0	3129	March 13, 2009
9800GX2 cannot overlap memcpy and kernel execution? CUDA Programming and Performance	2	1645	November 6, 2009
streams not overlapping CUDA Programming and Performance	1	1611	May 23, 2011
Profiler, GPU/CPU time CUDA Programming and Performance	0	2593	January 29, 2009
Problems with Streams Very strange!!! CUDA Programming and Performance	1	7689	November 26, 2009
Concurrent copy & execution problem Device to host memory copy is not overlapped with kernel exe CUDA Programming and Performance	1	1840	June 23, 2010
Visual Profiler displays erroneous output with multiple GPUs Profiler problem on multi-gpu scaling b CUDA Programming and Performance	0	842	May 9, 2012

concurrent copy and execution not showing in visual profiler

Related topics