clock() doesn't work properly

Hi,

I’m trying to know how many time my kernel takes, using the following code:

            //START CLOCK COUNTER
	clock_t timer_start = clock();

	//ENTER KERNEL
	my_kernel <<< dimGrid, dimBlock >>> (args);
	checkCUDAError("kernel");
	//EXIT KERNEL    
	
	//STOP AND PRINT THE TIMER
	cudaThreadSynchronize();
	timer_end = clock();
	printf("start cicles : %ld\n", timer_start);
	printf("end   cicles : %ld\n", timer_end);
	timer_diff = static_cast<long double>( (timer_end - timer_start + 0.0) / CLOCKS_PER_SEC); 
            printf("my Kernel, iteration %d: %.12Lf seconds\n",i, timer_diff);

but the result I get is almost always the same number of cycles for timer_start and timer_end, so the last line is usually 0.000…0

In some more strange cases, there’s a difference of 1000 cycles, between start and stop, so the last line is 0.010…0

I’m sure there’s something wrong, because it is impossible to not change at least 1 clock between the kernel execution.

The usual response I get is this:

start cicles : 120000
end cicles : 120000
Kernel, iteration 0: 0.000000000000 seconds

start cicles : 130000
end cicles : 130000
Kernel, iteration 1: 0.000000000000 seconds

start cicles : 130000
end cicles : 130000
Kernel, iteration 2: 0.000000000000 seconds

start cicles : 130000
end cicles : 130000
Kernel, iteration 3: 0.000000000000 seconds

start cicles : 130000
end cicles : 130000
Kernel, iteration 4: 0.000000000000 seconds

Can anybody help me?

Thanks in advance

I don’t think you can use clock, since the control is given back to the CPU directly after the kernel launch (not when the kernel has finished) and thereby there is no change in the number of clock cycles for the CPU.

I normally use

unsigned int hTimer;

cutilSafeCall( cudaThreadSynchronize() );	  

cutilCheckError( cutResetTimer(hTimer) );

cutilCheckError( cutStartTimer(hTimer) );

my_kernel <<< dimGrid, dimBlock >>> (args);

cutilSafeCall( cudaThreadSynchronize() );

cutilCheckError( cutStopTimer(hTimer) );

double time = cutGetTimerValue(hTimer);

I know the control is returned to the CPU after launching the kernel, but the timer_stop clock I use is after a “cudaThreadSynchronize()” call, so this shoudn’t be the problem…should it?

I normally set up the clock variables as integers.

So…

int startClock, endClock;

startClock = clock();

run kernel <<

stopClock = clock();

int executionTime = stopClock-endClock;

Hopefully, this is similar to what you’re trying to do. Be sure to include <time.h>.

The clock() function has a very bad resolution (typically several milliseconds, apparently 10ms in your case), and cannot accurately measure periods shorter than a few seconds.
This has nothing to do with CUDA. CUDA also has a device function named clock() on the GPU which is cycle-accurate, but you obviously cannot call it from host code.

Use rather gettimeofday on Unix and QueryPerformanceCounter on Windows, or the RDTSC X86 instruction, which are much more accurate. Or even better, use CUDA events.

Well, I found another way of doing it, also from the host, but it need a cuda library and I don’t want to read any cuda library from my code.

Did the cuda events need any library? How should I use them?

Thanks!

Any CU file will get linked against some kind of CUDA library – atleast to launch the kernels… no?

What is so wrong in linking against cuda.lib or cudart.lib? One of these would implement the event APIs, I guess…

The problem is that my project will be involved into a hug library or resources of a department, and they want to use only their own libraries, and also the necessary ones (in my case, cuda.h is really needed because if not, kernel won’t launch as you said before, but cudart.lib is not so needed).

Any other way? In the case I’ll have to use one of this libraries, what should be the more accurated method?

Thanks in advance!

I believe cudart is the runtime library, but events are also implemented in the driver API so I suggest you use events, they’re very accurate.

N.

Thanks a lot!

You’re right, events are implemented in both API’s, and there’s no need to include any other library.

As we can see in the reference manual, events have a resolution of 0.5 microseconds, so it works fine!

Thanks!

-KaiK-

Hi,

Use:

[font=“Lucida Console”]
//create and start timer
unsigned int timerGPU = 0;
cudaThreadSynchronize();
cutCreateTimer(&timerGPU);
cutStartTimer(timerGPU);

… your kernel execution …

//stop timer and show result:
cudaThreadSynchronize();
cutStopTimer(timerGPU);
printf(“Processing time: %f (ms) \n”, cutGetTimerValue(timerGPU));
[/font]

That works for me.
Sometimes cuda has to be initialized, before that code works, so if you have problems, write a minimal dummy-kernel and start that kernel at the beginning of your programm.