calculating execution time

Kiran_CUDA · June 21, 2009, 12:26pm

Hi!!!

I am a bit new to CUDA and I want to calculate the execution time of my CUDA program. Basically I have to compare the performance of my program for calculating sum of two matrices of order 1000 x 1000, first on GPU and then I will use Device emulation mode to compare the performance on CPU. (Do you think Device emulation mode can be used as a bench mark ?)
So for this I need to know the execution time in each case. How do we find the execution time? what function and liberary is used and where exactly we put this function.

Thanks for spending your precious time!!

nitin.life · June 21, 2009, 12:40pm

THe simple way… : (some sdk examples also use this)

you may need

"#include <cutil.h> "

[codebox]unsigned int timer;

cutCreateTimer(&timer);

cutResetTimer(timer);

cutStartTimer(timer);

   YOUR KERNEL LAUNCH GOES HERE

cutStopTimer(timer)

double milliseconds = CutGetTimerValue(timer);[/codebox]

Better way

You can use cuda events also for better timing see programming guide

_Big_Mac · June 21, 2009, 3:10pm

For the above to work correctly, you will need to place cudaThreadSynchronize() before stopping the timer. Alternatively use events, as mentioned.

Also, no, device emulation cannot be used as a benchmark. It’s usually WAY slower than if you’d reimplemented it on the CPU.

Luckily, it doesn’t have to be very hard. It’s sometimes enough to copy the contents of your kernel into host code, slap a for around it (perhaps two fors, one for blocks, other for threads within) and then put a

#pragma omp parallel for

above the for if your compiler supports OpenMP. Remember to enable the use of CPU vector intrinsics (SSE2 for example), the compiler should be smart enough to autovectorize at least parts of your host code. This can pass as a CPU benchmark although there are cases when it’s not that straightforward, for example if there’s shared memory in use.

Read about MCUDA http://www.gigascale.org/pubs/1278.html to find out how to make efficient CPU code from CUDA kernels. They haven’t yet released a compiler that does it for you but they describe the methods in their paper.

mit_alpha · June 21, 2009, 9:47pm

Usually, you cannot use emulator to compare CPU vs CPU-GPU performance

If you make lots of summations with 2 matrices A and B with size(1K*1K) and those matrix data are generated in CPU, then you will spend most of your execution time on moving data from host to device and back.
This is a good example to show that ratio of data transfer time to calculation time is very important factor that can completely negate GPU usage
Try scalarProduct project from SDK with GPU, however

move starting timer for GPU calculations in front of copying data from host to device and move stop timer after data moved from device to host. You will see that CPU execution time is better than CPU-GPU. This is how benchmarking should be done for this example in SDK, HEHE…

Kiran_CUDA · June 22, 2009, 7:13am

Thanks a lot guys for your help and valuable information.

Topic		Replies	Views
Comparing GPU vs CPU execution times CUDA Programming and Performance	4	4577	March 17, 2020
Compare Execution Times CPU vs GPU the proper way? CUDA Programming and Performance	5	5999	September 8, 2009
Measurement of execution time CUDA Programming and Performance	4	10398	November 19, 2008
Issues with measuring speedup timing analysis for CUDA CUDA Programming and Performance	0	733	July 3, 2010
Trying to do some bench... CUDA Programming and Performance	0	844	January 30, 2009
Code to calculate the time of execution of the program cuda time of execution of the program cuda CUDA Programming and Performance	4	6567	August 9, 2010
how to evaluate the CUDA's performance how can i know the program is optimazed CUDA Programming and Performance	7	7338	July 24, 2008
CUDA emulation release Performance when running in emulation CUDA Programming and Performance	6	5708	October 23, 2007
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13389	July 9, 2008
Measure execution time Multiple GPU on CUDA 4.0 with cudaEvents CUDA Programming and Performance	1	9812	March 9, 2011

calculating execution time

Related topics