Compare Execution Times CPU vs GPU the proper way?

neoideo · August 25, 2009, 7:14am

Im measuring time on some functions writen in C++, with the “clock” structure so i get the dedicated CPU clocks for that portion of code, this way: (demostrative code only, dont look syntax )

double first = clock();

FUNCTION();

double last = clock();

printf("time = %d", (double)(last-first)/CLOCKS_PER_SECOND );

but im wondering how im gonna measure time when the functions get implemented on CUDA, because if i do the same method i should get 0 clocks since there is no CPU work right?

i could use time_t (dates) differences converted to secs instead of clocks. but something tells me that i should ask the experienced people just in case.

what is the best and reccomendable way to do it?

thanks in advance

Cristobal

apaehler · August 25, 2009, 9:10am

Im measuring time on some functions writen in C++, with the “clock” structure so i get the dedicated CPU clocks for that portion of code, this way: (demostrative code only, dont look syntax )
double first = clock();

FUNCTION();

double last = clock();

printf("time = %d", (double)(last-first)/CLOCKS_PER_SECOND );
but im wondering how im gonna measure time when the functions get implemented on CUDA, because if i do the same method i should get 0 clocks since there is no CPU work right?

i could use time_t (dates) differences converted to secs instead of clocks. but something tells me that i should ask the experienced people just in case.

what is the best and reccomendable way to do it?

thanks in advance

Cristobal

do the following (Python syntax/Linux since this is how I do it):

t0 = time()

cudaThreadSynchronize()

GPU work (kernels - I/O) goes here

cudaThreadSynchronize()

t1 = time()

print ‘elapsed time’, t1-t0

you can also use cudaEvents, to record the GPU time. This usually gives more or less the same results. Essential is cudaThreadSynchronize, since kernel launches are asynchronous. If you write data to the GPU prior to kernel launch read or results back after it, that will do an implicit synchronization.

The time function in the time module in Python under Linux has a resolution of about one microsecond (essentially the resolution of the underlying Linux timers). If you do this in C, you might not want to use clock, because this measure CPU clock, but gettimeofday. This will get you elapsed wall time which is what matters for all practical estimates.

Also, starting up the GPU takes some time, especially so with code using CUDA runtime, CUBLAS etc, so running a kernel once prior to timing - in some SDK examples called “warmup” - is recommended. Then the actual timing code will be unaffected by this. On my cards startup is usually about 0.3 to 0.6 sec.

neoideo · August 25, 2009, 1:04pm

thanks, ill make that change then :)

laxsu19 · August 25, 2009, 2:17pm

do the following (Python syntax/Linux since this is how I do it):

t0 = time()

cudaThreadSynchronize()

GPU work (kernels - I/O) goes here

cudaThreadSynchronize()

t1 = time()

print ‘elapsed time’, t1-t0

you can also use cudaEvents, to record the GPU time. This usually gives more or less the same results. Essential is cudaThreadSynchronize, since kernel launches are asynchronous. If you write data to the GPU prior to kernel launch read or results back after it, that will do an implicit synchronization.

The time function in the time module in Python under Linux has a resolution of about one microsecond (essentially the resolution of the underlying Linux timers). If you do this in C, you might not want to use clock, because this measure CPU clock, but gettimeofday. This will get you elapsed wall time which is what matters for all practical estimates.

Also, starting up the GPU takes some time, especially so with code using CUDA runtime, CUBLAS etc, so running a kernel once prior to timing - in some SDK examples called “warmup” - is recommended. Then the actual timing code will be unaffected by this. On my cards startup is usually about 0.3 to 0.6 sec.

I’m not exactly sure that its a ‘fair game’ to be warming up the GPU before beginning the execution time. If you are doing this timing to report speedups allowed by the GPU compared to the CPU, then you should “record” everything, at least from the point of divergence, including cudaMemcpy’s, and ‘warmup’ kernel runs.

Now, if you just want to know the time, and arent comparing it to CPU code, then do what ever you’d like. I just thought I’d throw my two cents in there for fairness.

neoideo · September 2, 2009, 7:02pm

what im doing now is using gettimeofday for the CPU functions and cuda timers for the GPU ones, with the threadsyncronize line.

im getting accurate results.

ill have to check those warming up times, at the moment they seem to be very low which is good.
thanks

Nemandza · September 8, 2009, 2:26pm

Actually more accurately would be:

cudaThreadSynchronize()
t0 = time()
GPU work (kernels - I/O) goes here
cudaThreadSynchronize()
t1 = time()
print ‘elapsed time’, t1-t0

Try also running the kernel more times in a loop and averaging elapsed intervals.

Topic		Replies	Views
Comparing GPU vs CPU execution times CUDA Programming and Performance	4	4569	March 17, 2020
Compare GPU and CPU function time CUDA Programming and Performance	7	6306	May 30, 2011
CUDA OpenCL comparison CUDA Programming and Performance	9	3399	August 23, 2011
calculating execution time CUDA Programming and Performance	4	5524	June 22, 2009
How to get exact measurement of CPU and GPU running time? CUDA Programming and Performance cuda	2	1631	August 12, 2023
Execution time is not proportional to the time steps CUDA Programming and Performance	5	1038	May 6, 2012
Number of GPU clock cycles CUDA Programming and Performance	15	10267	June 16, 2017
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13327	July 9, 2008
Issues with measuring speedup timing analysis for CUDA CUDA Programming and Performance	0	733	July 3, 2010
How to measure total time for CPU and GPU CUDA Programming and Performance	8	23720	September 14, 2017

Compare Execution Times CPU vs GPU the proper way?

Related topics