clock() doesn't work properly

KaiK · July 1, 2009, 12:48pm

Hi,

I’m trying to know how many time my kernel takes, using the following code:

            //START CLOCK COUNTER
	clock_t timer_start = clock();

	//ENTER KERNEL
	my_kernel <<< dimGrid, dimBlock >>> (args);
	checkCUDAError("kernel");
	//EXIT KERNEL    
	
	//STOP AND PRINT THE TIMER
	cudaThreadSynchronize();
	timer_end = clock();
	printf("start cicles : %ld\n", timer_start);
	printf("end   cicles : %ld\n", timer_end);
	timer_diff = static_cast<long double>( (timer_end - timer_start + 0.0) / CLOCKS_PER_SEC); 
            printf("my Kernel, iteration %d: %.12Lf seconds\n",i, timer_diff);

but the result I get is almost always the same number of cycles for timer_start and timer_end, so the last line is usually 0.000…0

In some more strange cases, there’s a difference of 1000 cycles, between start and stop, so the last line is 0.010…0

I’m sure there’s something wrong, because it is impossible to not change at least 1 clock between the kernel execution.

The usual response I get is this:

start cicles : 120000
end cicles : 120000
Kernel, iteration 0: 0.000000000000 seconds

start cicles : 130000
end cicles : 130000
Kernel, iteration 1: 0.000000000000 seconds

start cicles : 130000
end cicles : 130000
Kernel, iteration 2: 0.000000000000 seconds

start cicles : 130000
end cicles : 130000
Kernel, iteration 3: 0.000000000000 seconds

start cicles : 130000
end cicles : 130000
Kernel, iteration 4: 0.000000000000 seconds

Can anybody help me?

Thanks in advance

wanderine · July 1, 2009, 1:16pm

Hi,

I’m trying to know how many time my kernel takes, using the following code:

//START CLOCK COUNTER
	clock_t timer_start = clock();

	//ENTER KERNEL

	my_kernel <<< dimGrid, dimBlock >>> (args);

	checkCUDAError("kernel");

	//EXIT KERNEL    

	

	//STOP AND PRINT THE TIMER

	cudaThreadSynchronize();

	timer_end = clock();

	printf("start cicles : %ld\n", timer_start);

	printf("end   cicles : %ld\n", timer_end);

	timer_diff = static_cast<long double>( (timer_end - timer_start + 0.0) / CLOCKS_PER_SEC); 

            printf("my Kernel, iteration %d: %.12Lf seconds\n",i, timer_diff);
but the result I get is almost always the same number of cycles for timer_start and timer_end, so the last line is usually 0.000…0

In some more strange cases, there’s a difference of 1000 cycles, between start and stop, so the last line is 0.010…0

I’m sure there’s something wrong, because it is impossible to not change at least 1 clock between the kernel execution.

The usual response I get is this:

start cicles : 120000

end cicles : 120000

Kernel, iteration 0: 0.000000000000 seconds

start cicles : 130000

end cicles : 130000

Kernel, iteration 1: 0.000000000000 seconds

start cicles : 130000

end cicles : 130000

Kernel, iteration 2: 0.000000000000 seconds

start cicles : 130000

end cicles : 130000

Kernel, iteration 3: 0.000000000000 seconds

start cicles : 130000

end cicles : 130000

Kernel, iteration 4: 0.000000000000 seconds

Can anybody help me?

Thanks in advance

I don’t think you can use clock, since the control is given back to the CPU directly after the kernel launch (not when the kernel has finished) and thereby there is no change in the number of clock cycles for the CPU.

I normally use

unsigned int hTimer;

cutilSafeCall( cudaThreadSynchronize() );	  

cutilCheckError( cutResetTimer(hTimer) );

cutilCheckError( cutStartTimer(hTimer) );

my_kernel <<< dimGrid, dimBlock >>> (args);

cutilSafeCall( cudaThreadSynchronize() );

cutilCheckError( cutStopTimer(hTimer) );

double time = cutGetTimerValue(hTimer);

KaiK · July 1, 2009, 1:35pm

I know the control is returned to the CPU after launching the kernel, but the timer_stop clock I use is after a “cudaThreadSynchronize()” call, so this shoudn’t be the problem…should it?

awthomp · July 1, 2009, 2:09pm

Hi,

I’m trying to know how many time my kernel takes, using the following code:

//START CLOCK COUNTER
	clock_t timer_start = clock();

	//ENTER KERNEL

	my_kernel <<< dimGrid, dimBlock >>> (args);

	checkCUDAError("kernel");

	//EXIT KERNEL    

	

	//STOP AND PRINT THE TIMER

	cudaThreadSynchronize();

	timer_end = clock();

	printf("start cicles : %ld\n", timer_start);

	printf("end   cicles : %ld\n", timer_end);

	timer_diff = static_cast<long double>( (timer_end - timer_start + 0.0) / CLOCKS_PER_SEC); 

            printf("my Kernel, iteration %d: %.12Lf seconds\n",i, timer_diff);
but the result I get is almost always the same number of cycles for timer_start and timer_end, so the last line is usually 0.000…0

In some more strange cases, there’s a difference of 1000 cycles, between start and stop, so the last line is 0.010…0

I’m sure there’s something wrong, because it is impossible to not change at least 1 clock between the kernel execution.

The usual response I get is this:

start cicles : 120000

end cicles : 120000

Kernel, iteration 0: 0.000000000000 seconds

start cicles : 130000

end cicles : 130000

Kernel, iteration 1: 0.000000000000 seconds

start cicles : 130000

end cicles : 130000

Kernel, iteration 2: 0.000000000000 seconds

start cicles : 130000

end cicles : 130000

Kernel, iteration 3: 0.000000000000 seconds

start cicles : 130000

end cicles : 130000

Kernel, iteration 4: 0.000000000000 seconds

Can anybody help me?

Thanks in advance

I normally set up the clock variables as integers.

So…

int startClock, endClock;

startClock = clock();

run kernel <<

stopClock = clock();

int executionTime = stopClock-endClock;

Hopefully, this is similar to what you’re trying to do. Be sure to include <time.h>.

Sylvain_Collange · July 1, 2009, 2:59pm

The clock() function has a very bad resolution (typically several milliseconds, apparently 10ms in your case), and cannot accurately measure periods shorter than a few seconds.
This has nothing to do with CUDA. CUDA also has a device function named clock() on the GPU which is cycle-accurate, but you obviously cannot call it from host code.

Use rather gettimeofday on Unix and QueryPerformanceCounter on Windows, or the RDTSC X86 instruction, which are much more accurate. Or even better, use CUDA events.

KaiK · July 1, 2009, 3:54pm

Well, I found another way of doing it, also from the host, but it need a cuda library and I don’t want to read any cuda library from my code.

Did the cuda events need any library? How should I use them?

Thanks!

Sarnath · July 2, 2009, 7:16am

Any CU file will get linked against some kind of CUDA library – atleast to launch the kernels… no?

What is so wrong in linking against cuda.lib or cudart.lib? One of these would implement the event APIs, I guess…

KaiK · July 2, 2009, 7:35am

The problem is that my project will be involved into a hug library or resources of a department, and they want to use only their own libraries, and also the necessary ones (in my case, cuda.h is really needed because if not, kernel won’t launch as you said before, but cudart.lib is not so needed).

Any other way? In the case I’ll have to use one of this libraries, what should be the more accurated method?

Thanks in advance!

Nico · July 2, 2009, 7:46am

I believe cudart is the runtime library, but events are also implemented in the driver API so I suggest you use events, they’re very accurate.

N.

KaiK · July 2, 2009, 1:58pm

Thanks a lot!

You’re right, events are implemented in both API’s, and there’s no need to include any other library.

As we can see in the reference manual, events have a resolution of 0.5 microseconds, so it works fine!

Thanks!

-KaiK-

Philipp82 · July 3, 2009, 8:47am

Hi,

Use:

[font=“Lucida Console”]
//create and start timer
unsigned int timerGPU = 0;
cudaThreadSynchronize();
cutCreateTimer(&timerGPU);
cutStartTimer(timerGPU);

… your kernel execution …

//stop timer and show result:
cudaThreadSynchronize();
cutStopTimer(timerGPU);
printf(“Processing time: %f (ms) \n”, cutGetTimerValue(timerGPU));
[/font]

That works for me.
Sometimes cuda has to be initialized, before that code works, so if you have problems, write a minimal dummy-kernel and start that kernel at the beginning of your programm.

Topic		Replies	Views
Number of GPU clock cycles CUDA Programming and Performance	15	10195	June 16, 2017
timing kernel execution with clock() CUDA Programming and Performance	6	3730	July 6, 2009
Cuda program taking more time. CUDA Programming and Performance	15	7059	November 21, 2010
CUDA OpenCL comparison CUDA Programming and Performance	9	3394	August 23, 2011
SPMT: Single Program Multiple (Exeuction) Time CUDA Programming and Performance	15	3900	July 4, 2009
Can kernel function parallel with CPU code? CUDA Programming and Performance	12	7735	December 5, 2008
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1770	November 4, 2016
Oscilating performance, Code total times variates CUDA Programming and Performance	10	10571	June 21, 2009
Timing Question timing of a function not clear CUDA Programming and Performance	15	10240	November 30, 2007
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13254	July 9, 2008

clock() doesn't work properly

Related topics