Cuda Programming Timing

I took objective of finding prime numbers between 1 to 200000. It took 7.77 seconds using c language and 5.04 seconds using cuda program. I used clock() in both c and cuda code to calculate my execution time. Is that a proper way to calculate runtime?.

No it is not. A kernel call is non-blocking. This means that when you call a kernel the control is returned to the host imediatly, not after the work on the device is finished. There are a few blocking commands such as some data transfers with cudamemcpy. Try this:

float gputime;

    cudaEvent_t start,stop;



// the gpu work





    cudaEventDestroy(stop) ;   

    printf(" \n");

printf("Time = %g \n",  gputime/1000.0f);  

printf(" \n");

Well it is if your objective is to time the runtime of the entire program. If however you want to time single kernel, follow pasoleatis’ advice.