When timing a function, clock() varies each time. Clock() (at start) â€“ Clock() (at end) = 9565 or

sunsetquest · January 21, 2009, 5:04pm

I want to time a fixed function however each time I execute the kernel I get a slightly different value. I would think (but am probably wrong) that the time would be the same each time it is called. I can understand how a CPU varies because of caching each time and cpu slices given to other processes. However, I would guess that each time a GPU kernel is called the state of the GPU is â€œresetâ€ and there are no other time slice shares with other programs so it should always be the same. Does anyone know why this might be so?

Note: I noticed that this happens when memory is read only…

My sample code…

[font=“Courier New”]define NUM_BLOCKS 1
define NUM_THREADS 1
global static void timedReduction(const float * input, float * output, clock_t * timer)
{
timer[0] = clock();
output[0] = input[0];
timer[1] = clock();
}[/font]

Output…
(Trial 1 results) Time = 2600, 2232, 2364, 2456, 2392, 2616
(Trial 2 results) Time = 2548, 2376, 2348, 2384, 2428, 2436

Sample code 2 …

[font=“Courier New”]global static void timedReduction(const float * input, float * output, clock_t * timer)
{
timer[0] = clock();
output[0] = 1;
timer[1] = clock();
}[/font]

(Trial 1 results) Time = 178, 178, 178, 178, 178, 178
(Trial 2 results) Time = 178, 178, 178, 178, 178, 178

tmurray · January 21, 2009, 5:12pm

Because it will depend on what DRAM banks are active, etc.

sunsetquest · January 21, 2009, 5:33pm

Hi tmurray, Thanks for the reply.

When i run the kernel in a loop…
(Trial 1 results) Time = 2600, 2232, 2364, 2456, 2392, 2616
…isnâ€™t this using the same DRAM banks?

I’m guessing the best solution would be to run the “sample code” several times and average it out (like would be done on a cpu).

sunsetquest · February 1, 2009, 4:46pm

FYI - for anyone that is interested in consistent exact timing. When i say consistent exact
timing i mean that each time the code runs it will always give you the same timing for a
chunk of code. I have not tested other video cards but it would be my guess that each
generation GPU processor (or maybe even each model vga card) will have different timings.

From my playing around it seems like exact consistent timing is possible when…

-The number of threads is 1 (I’m guessing up to 32)
-The timer is not running when there are global memory reads.
-global writes seem to be okay
-ether all global memory is copied into shared memory or registers first and then the
timer is started or pause the timer when reading from global memory.
-I did not test read-only (cached) memory - but i’m guessing the timer cannot be running
during the first read.

Global memory reads seem to differ each time they are called - they seem to be a little bit
random. There does not seem to be a consistency even when I time it in a loop in kernel,
or when I time from kernel from call to call, or when i completely restart the host application.
Global memory reads are not consistent.

Note: if anyone is trying playing around with this make sure that device emulation is not
turned on or else you will be using the CPU timings.

Topic		Replies	Views
1% variation in kernel timings cuda-5.0/samples/bin/linux/release/clock GeForce 295 GTX CUDA Programming and Performance	8	1584	May 24, 2013
Inconsistent kernel run times CUDA Programming and Performance	12	5916	August 5, 2009
Inconsistent behaviour of 8800 GTX hardware Inconsistent output for the same code CUDA Programming and Performance	2	1689	November 5, 2007
Different timing results for the same kernel function Problem comparing two kernel functions CUDA Programming and Performance	1	1080	February 22, 2019
timing kernel execution with clock() CUDA Programming and Performance	6	3845	July 6, 2009
Getting Different Execution Times of Running Same Kernel Twice CUDA Programming and Performance	2	76	August 13, 2024
clock() doesn't work properly CUDA Programming and Performance	10	6449	July 3, 2009
Reading globaltimer register or calling clock/clock64 in loop prevent concurrent kernel execution? CUDA Programming and Performance	17	2955	March 25, 2017
Is CUDA timer trustable? CUDA Programming and Performance	1	3974	July 6, 2007
How to correctly measure kernel exec time? CUDA Programming and Performance	2	3147	March 19, 2008

When timing a function, clock() varies each time. Clock() (at start) â€“ Clock() (at end) = 9565 or

My sample code…

Sample code 2 …

Related topics