Compare GPU and CPU function time


I have a CPU and a GPU function. I want to compare these functions running time. I use NVIDIA Visual Profiler, but it only shows the GPU time.
Is there any way to compare these function running time?

I have been using this, which is an amalgam of code snippets from various places (if anyone knows a better way, or if this is wrong, please share!):

[EDIT1] I think you need to #include to use CLOCKS_PER_SEC

[EDIT2] And as far as I could tell, CPU time reports in microseconds. That’s why I multiply by 1000 to have it in same units as the CUDA timing, which returns in milliseconds

float elapsed_cpu, elapsed_gpu;

	// begin timing CPU

	clock_t t1, t2;

	t1 = clock();

	... 	// CPU function here

	t2 = clock();

	elapsed_cpu = ((float)t2 - (float)t1) / CLOCKS_PER_SEC * 1000;	//cpu elapsed time in ms

	// begine timing GPU

	cudaEvent_t start, stop;



	cudaEventRecord(start, 0);

	... 	// GPU launch here

	cudaEventRecord(stop, 0);


	cudaEventElapsedTime(&elapsed_gpu, start, stop);



Thank You!

But when I print the elapsed times the elapsed_cpu’s decimals is always zeros. Example:

printf("\nCPU:%f, GPU:%f",elapsed_cpu, elapsed_gpu);

CPU:17.000000, GPU:5.772768

CPU:17.000000, GPU:5.781632

CPU:17.000000, GPU:5.798272

on cpu some function give ms 0 15 ms 31 ms … so no decimale
on visual basic it s
Public Declare Function timeGetTime Lib “winmm.dll” () As Long
for have 1 ms need
Public Declare Function timeBeginPeriod Lib “winmm.dll” (ByVal uPeriod As Long) As Long
timebeginperiod 1

some function can give better on my computer can give 1/266001 second
Public Declare Function QueryPerformanceCounter Lib “kernel32” Alias “QueryPerformanceCounter” (lpPerformanceCount As currency) As Long
Public Declare Function QueryPerformanceFrequency Lib “kernel32” Alias “QueryPerformanceFrequency” (lpFrequency As currency) As Long

QueryPerformanceFrequency tt1
QueryPerformanceCounter tt2
QueryPerformanceCounter tt3
MsgBox (tt3 - tt2) / tt1 * 1000 & “ms”

On unixy systems gettimeofday() with it’s microsecond resolution often is the best timer (note that other timers might have worse resolution even if they use microseconds as units). So I regularly have code like the following to time stuff:

#include <sys/time.h>

	struct timeval start_time, end_time;

	gettimeofday(&start_time, NULL);

	// do some stuff

	gettimeofday(&end_time, NULL);

	float runtime = (end_time.tv_sec-start_time.tv_sec)

	                + (end_time.tv_usec-start_time.tv_usec) * 1e-6;

	printf("stuff took %f seconds to execute.\n", runtime);

For CPU code I use Qt library.

QTime t;



elapsed_time_h = t.elapsed();

total_time_h += elapsed_time_h;

I works pretty good. You can also use some other library, like BOOST.

It’s a better solution if you are targeting many platforms.

I believe this is because the standard C [font=“Courier New”]clock()[/font] function only has granularity down to seconds. If you need to time sub-second timeframes, you will need to use a custom timer implementation as suggested.

I should have mentioned this limitation with my method, but it was OK for me since I’m timing things in the order of 100’s of seconds at least, so the error is less than 1% for my purposes.



You might find the discussion here useful. As has been suggested, Qt and BOOST are discussed, along with some custom classes.