Execution time is not proportional to the time steps

Hi, all

I am running my program with GPU implementation, and I found something weird that the computational time of my gpu code is not proportional to my time steps.

For example, if it takes 1 unit time for my code to complete 1 time step, it will take 5 unit times for 5 time steps when running on CPU, but almost takes 9 unit times for 5 time steps on gpu.

Since the whole execution time includes accessing data and computation, I expect the requires time will be proportional to the time step, which conflicts with the result.

Please help me on figuring out the reason which causes this problem, thanks a lot.

Maybe you have an error in the measuring the time it take to complete on gpu. Unless you post some code here it is difficult to say for sure

yeah, like a forgotten cudaThreadSynchronize() before the beginning and before the end of the measurement interval.


my time measurement is like that


clock_t start, end;

cudaMemcpy( d, h, size, cudaMemcpyHostToDevice );

start = clock();

for ( time = 0 ; time < maxtime, time++ )


kernel<<< grid, block >>>(…);


end = clock();

cudaMemcpy( h, d, size, cudaMemcpyDeviceToHost );

And then I just that computeTime = end - start to calculate my computational time, and I didn’t call the cudaThreadSynchronize() function.

Does this function call affect the my time measurement? thanks a lot.


This measurements are not correct, because the control returns to the host when a kernel function is called.

Use instead this code:

float gputime;

    cudaEvent_t start,stop;



// ....


// stuff to measure execution time





    cudaEventDestroy(stop) ;   

    printf(" \n");

printf("Time = %g \n",  gputime/1000.0f);  

printf(" \n");

hi, pasoleatis

I used your method to measure my computational time, and now the result is proportional to my time step, thanks a lot for your kind help.