Function executing time


ok i have build here some kind of searchalgorithm, it works fine BUT
if i ran it 180 times the execution time is round about 21ms per run, but if i run it 200 000 times the time is getting worser and worser so that i reach 40ms after 7000 runs and much much more at the end. how could this be. any ideas why the time is increasing.

What do you use for timing? Wall clock time? If so, do you call cudaThreadSynchronize() before measuring timing?

cutCreateTimer(&timer); right now

and yes i do call cudaThreadSynchonize(); and if not it would not play any matter because i am measure the time around all my kernels (there is not only one for that)

This is odd then. I’ve never seen this behavior before. Do you have any active program rendering to the display? Even it it is 2D? In my own project, kernels basically take the same time (say 1 ms each), but occasionally jump up to 1.2 ms or higher due to rendering the mouse cursor or something else on the screen, but nothing like the factor of 2 increase that you are seeing. If I grab and drag a window around the screen these times do get higher, though.

I assume you are running the kernel on the same dataset over and over again so that changing data will not change the run time. Could you run your program with the profiler activated? I’m curious to see if these increasing times show up in cuda_profile.log too. That would also tell us more about where the time is going: if it is actually taking up twice as much gputime at the end or it if is in driver overhead cputime.

part1 is the first iteration

part2 is one in the middle where time is more or less doubled.

part2.txt (63.6 KB)
part1.txt (69.5 KB)

files in previous post


And which part takes twice as long? Almost all the kernel calls in those traces take roughly the same amount of time. The only differences I see are 1) collect_values takes a little longer in part1.txt, though the time is so small you wouldn’t notice it. and 2) part2.txt includes a call to rot3d_GPU that takes a significant amount of time. Is this included in your timing?

i know that is why i am a little confused. but if i take the time over the whole procedures or simply send a “-” to stdout u can see that it is slower at the end .