Strange measured times of a CUDA application

Hi all!

Thanks to the CUDA and Thrust community I have finally finished my thesis and I am now going through the application in order to get some results.

I measure the time of the application like this:

clock_t c0, c1;

	c0 = clock();

	cudaEvent_t evstart, evstop;

	cudaEventCreate(&evstart);

	cudaEventCreate(&evstop);

	cudaEventRecord(evstart,0);

//main function call

c1 = clock();

	cudaEventRecord(evstop,0);

	cudaEventSynchronize(evstop);

	float time;

	cudaEventElapsedTime(&time,evstart,evstop);

	float time_cpu=(float)(c1-c0)/CLOCKS_PER_SEC;

	printf("\ntempo gasto(clock):%f s \n",time_cpu);

	printf("\ntempo gasto:%f ms",time);

My application is an implementation of the OS-EM algorithm for medical imaging(digital breast tomosynthesis) so the program contains some “for” loops in order to go through all the chunks of data and some that are inherent to the algorithm. I measure the time in each iteration of the OS-EM algorithm, because each time it concludes, the program outputs a reconstructed image.

The time measured for the first 2 or 3 iterations is about 35 seconds(which accounts for 6x speedup) but beyond that it starts to last around 80 seconds(3x speedup more or less). The breaking point for which it starts to take more time to go through each iteration is not constant, even now I have started running the program from the beginning and the first iteration lasted (waiting for the result…) 82 seconds! Normally I must reset the computer in order for the application to go back to its normal values.

If someone has any ideias on the matter I would be grateful, thanks a lot ;)

Anyone?