Question: time counting with/without memcpy

simonz1 · August 29, 2008, 6:30pm

Hi all,

I’m not sure if anyone else discussed this question before, if did, sorry for the reposting.

The problem is, when I tried to count the time of the matrixMul from CUDA SDK examples, I moved the statement of copying device memory back to the host after the line of CUT_SAFE_CALL(cutStopTimer(timer)). Then I got two different time results (I use 512 by 512 for all matrices to make them bigger):

$ ./debug/matrixMul
Processing time: 3.235000 (ms)
$ ./release/matrixMul
Processing time: 0.031000 (ms)

Why in release mode, it runs so fast??

But if I don’t change anything except for the size, the results are quite the same.
$ ./debug/matrixMul
Processing time: 5.632000 (ms)
$ ./release/matrixMul
Processing time: 5.035000 (ms)

I’m using nvcc 1.1 and SDK 1.1 btw.

tmurray · August 29, 2008, 6:35pm

first of all, why are you still using 1.1? 2.0 is pretty cool, we promise :(

the answer, if I’m understanding what you did correctly, is that you don’t have a cudaThreadSynchronize() before you call cutStopTimer. kernel launches are asynchronous, so you’re measuring just the launch, not the kernel’s execution time itself. cudaMemcpy forces a sync before it copies back and blocks until the memcpy is completed, so that should explain the differences.

simonz1 · August 30, 2008, 2:41pm

OH, OK, that makes sense. Thanks! Yep, I’d try 2.0.

Topic		Replies	Views
Can anyone explain the difference in time? CUDA Programming and Performance	2	2499	November 21, 2008
Getting diff time statistics for same function Totally confused after seeing results CUDA Programming and Performance	3	4268	December 4, 2007
How much time is cudaMemcpy() use? CUDA Programming and Performance	1	4069	July 30, 2008
cudaMemcpy(dataDev, dataHost, mem_size, cudaMemcpyHostToDevice) execution time to long CUDA Programming and Performance	2	6462	January 21, 2010
About CUDA CUDA Programming and Performance	2	4774	December 3, 2008
cudaMemcpy timing CUDA Programming and Performance	1	6830	December 8, 2010
Memcpy time consumption CUDA Programming and Performance	2	1924	July 10, 2008
How properly counting a performance/program time ? CUDA Programming and Performance	4	2652	August 28, 2007
Memory Transfer CUDA Programming and Performance	7	3082	October 10, 2008
Possibly Studpid question bout cudaMemcpy CudaMemcpy getting slow by time CUDA Programming and Performance	4	2112	February 26, 2010

Question: time counting with/without memcpy

Related topics