Cuda Programming Timing

I took objective of finding prime numbers between 1 to 200000. It took 7.77 seconds using c language and 5.04 seconds using cuda program. I used clock() in both c and cuda code to calculate my execution time. Is that a proper way to calculate runtime?.

No it is not. A kernel call is non-blocking. This means that when you call a kernel the control is returned to the host imediatly, not after the work on the device is finished. There are a few blocking commands such as some data transfers with cudamemcpy. Try this:

float gputime;

    cudaEvent_t start,stop;

    cudaEventCreate(&start);

    cudaEventCreate(&stop);

// the gpu work

    cudaEventRecord(stop,0);

    cudaEventSynchronize(stop);

    cudaEventElapsedTime(&gputime,start,stop);

cudaEventDestroy(start);

    cudaEventDestroy(stop) ;   

    printf(" \n");

printf("Time = %g \n",  gputime/1000.0f);  

printf(" \n");

This forum has lots of codes posted, try to use also the search function it will save you time

Well it is if your objective is to time the runtime of the entire program. If however you want to time single kernel, follow pasoleatis’ advice.