Memcpy timing

sedona · May 4, 2014, 10:32pm

I thought I had the cuda timing intricacies sorted, but I think I’m missing something. I was doing a pinned cudamemcpy (vanilla, not async) and timing it with window’s queryperformancecounter. The results were far too long and varied widely with the run. So I went back to the sdk sample, and noticed that in there, when copying pinned memory, they use a cudaMemcpyAsync on the default stream, and use cudaEvents to time that. They only use queryperformancecounter when using non pinned transfers.

So two questions from this:
1: My understanding of a defualt stream cudaMemcpyasync on the default stream is that it is functionally equivalent to a (synchronous) cudaMemcpy, so why are we using it at all in the sample?
2: Is this the memcpy timing rule: synchronous tranfers get timed with windows, else use cuda events?

Topic		Replies	Views
cudaMemcpyAsync slower than cudaMemcpy? CUDA Programming and Performance	1	3118	March 10, 2009
Much slower async memcpy in a separate stream than in stream 0 CUDA Programming and Performance	4	5246	July 23, 2015
Overhead of using non-default stream with cudaMemcpyAsync() too high? CUDA Programming and Performance	2	2172	August 5, 2009
Getting diff time statistics for same function Totally confused after seeing results CUDA Programming and Performance	3	4237	December 4, 2007
cudaMemcpyAsync not behaving asynchronously CUDA Programming and Performance	5	2509	July 4, 2008
some cuda question CUDA Programming and Performance	6	1040	December 23, 2015
Problem with CudaMemcpy CUDA Programming and Performance	1	723	March 18, 2014
[solved] strange cuda memcopy time CUDA Programming and Performance	5	762	March 26, 2015
some memcopy questions async, ping pong buffering, streaming CUDA Programming and Performance	5	3382	April 29, 2008
cudaMemcpy host->device and device->host speed CUDA Programming and Performance	6	15414	April 29, 2014

Memcpy timing

Related topics