On timing and timer

There is a considerable difference in execution time when it is measured with the profiler or a timer call inside the program.

I know that the profiler gives the time in µsec and the timer in msec, but still the difference is too large.

Which one is the good one?

This is a run with a modified simpleCUBLAS with increase N and timers inserted

simpleCUBLAS test running…

Setup Processing time: 346.609009 (ms)

CPU Processing time: 219813.844116 (ms) for 4194304 elements 220160.453125 346.609009

GPU Processing time: 0.093750 (ms) for 4194304 elements 220160.546875 220160.453125

Test PASSED

Results Processing time: 194.140625 (ms) 220354.687500 220160.546875

Press ENTER to exit…

CUDA_PROFILE_LOG_VERSION 1.4

CUDA_DEVICE 0 GeForce 9600 GSO

method,gputime,cputime,occupancy

method=[ memcopy ] gputime=[ 5613.696 ] cputime=[ 9071.000 ]

method=[ memcopy ] gputime=[ 6272.576 ] cputime=[ 8987.000 ]

method=[ memcopy ] gputime=[ 6311.232 ] cputime=[ 9036.000 ]

method=[ _Z7sgemmNNPKfiS0_iPfiiff ] gputime=[ 152526.141 ] cputime=[ 26.000 ] occupancy=[ 0.333 ]

method=[ memcopy ] gputime=[ 9444.704 ] cputime=[ 23137.000 ]

Did you add the timer code yourself? If so, how did you do the timing?

I hope this is the proper way to do the timing.

// stop and destroy timer

	cutilCheckError(cutStopTimer(timer));

timer1 = cutGetTimerValue(timer);

	printf("Setup Processing time: %f (ms) \n",  timer1 );

	cutilCheckError(cutStartTimer(timer));

	/* Performs operation using plain C code */

	simple_sgemm(N, alpha, h_A, h_B, beta, h_C);

	h_C_ref = h_C;

	// stop and destroy timer

	cutilCheckError(cutStopTimer(timer));

timer2 = cutGetTimerValue(timer);

	printf("CPU Processing time: %f (ms) for %d   elements  %f  %f \n", timer2-timer1,n2, timer2,timer1);

	cutilCheckError(cutStartTimer(timer));

	/* Clear last error */

	cublasGetError();

	/* Performs operation using cublas */

	cublasSgemm('n', 'n', N, N, N, alpha, d_A, N, d_B, N, beta, d_C, N);

	status = cublasGetError();

	if (status != CUBLAS_STATUS_SUCCESS) {

		fprintf (stderr, "!!!! kernel execution error.\n");

		return EXIT_FAILURE;

	}

	// stop and destroy timer

	cutilCheckError(cutStopTimer(timer));

timer3 = cutGetTimerValue(timer);

	printf("GPU  Processing time: %f (ms)  for %d elements %f  %f\n", timer3-timer2,n2,timer3,timer2);

	cutilCheckError(cutStartTimer(timer));

No, that doesn’t look right to me.

Have a look at template.cu in the SDK for an example.

There is no timing in the template.

Can you show me what is not right in my code?

But what does this phrase means exactly:

"The ‘gputime’ and ‘cputime’ labels specify the actual chip

execution time and the driver execution time (including gputime),"

Form CUDA_Profiler_2.2.txt

My conclusion so far, is that the profiler time is the proper time.

Kernel launches (and CUBLAS calls) are asynchronous. What you are measuring is only the kernel launch times, not the run times. You should add a cudaThreadSynchronize() call before you stop and read out the timer. This will make the host CPU sit in a spinlock until the kernel is finshing running.

Thank you very much, now it all makes sense

simpleCUBLAS test running…

Setup Processing time: 184.990005 (ms)

CPU Processing time: 19273.943588 (ms) for 1048576 elements 19458.933594 184.990005

GPU Processing time: 19.585938 (ms) for 1048576 elements 19478.519531 19458.933594

Test PASSED

Results Processing time: 9.779297 (ms) 19488.298828 19478.519531

Press ENTER to exit…

CUDA_PROFILE_LOG_VERSION 1.4

CUDA_DEVICE 0 GeForce 9600 GSO

method,gputime,cputime,occupancy

method=[ memcopy ] gputime=[ 1099.488 ] cputime=[ 2493.000 ]

method=[ memcopy ] gputime=[ 1795.968 ] cputime=[ 2459.000 ]

method=[ memcopy ] gputime=[ 1799.072 ] cputime=[ 2435.000 ]

method=[ _Z7sgemmNNPKfiS0_iPfiiff ] gputime=[ 19248.512 ] cputime=[ 25.000 ] occupancy=[ 0.333 ]

method=[ memcopy ] gputime=[ 1874.912 ] cputime=[ 5588.000 ]

~

There is in mine:

unsigned int timer = 0;

	cutilCheckError( cutCreateTimer( &timer));

	cutilCheckError( cutStartTimer( timer));

(do work)

cutilCheckError( cutStopTimer( timer));

	printf( "Processing time: %f (ms)\n", cutGetTimerValue( timer));

	cutilCheckError( cutDeleteTimer( timer));