jam11
July 14, 2009, 11:05pm
1
There is a considerable difference in execution time when it is measured with the profiler or a timer call inside the program.
I know that the profiler gives the time in µsec and the timer in msec, but still the difference is too large.
Which one is the good one?
This is a run with a modified simpleCUBLAS with increase N and timers inserted
…
simpleCUBLAS test running…
Setup Processing time: 346.609009 (ms)
CPU Processing time: 219813.844116 (ms) for 4194304 elements 220160.453125 346.609009
GPU Processing time: 0.093750 (ms) for 4194304 elements 220160.546875 220160.453125
Test PASSED
Results Processing time: 194.140625 (ms) 220354.687500 220160.546875
Press ENTER to exit…
CUDA_PROFILE_LOG_VERSION 1.4
CUDA_DEVICE 0 GeForce 9600 GSO
method,gputime,cputime,occupancy
method=[ memcopy ] gputime=[ 5613.696 ] cputime=[ 9071.000 ]
method=[ memcopy ] gputime=[ 6272.576 ] cputime=[ 8987.000 ]
method=[ memcopy ] gputime=[ 6311.232 ] cputime=[ 9036.000 ]
method=[ _Z7sgemmNNPKfiS0_iPfiiff ] gputime=[ 152526.141 ] cputime=[ 26.000 ] occupancy=[ 0.333 ]
method=[ memcopy ] gputime=[ 9444.704 ] cputime=[ 23137.000 ]
…
Did you add the timer code yourself? If so, how did you do the timing?
jam11
July 15, 2009, 10:34am
3
I hope this is the proper way to do the timing.
// stop and destroy timer
cutilCheckError(cutStopTimer(timer));
timer1 = cutGetTimerValue(timer);
printf("Setup Processing time: %f (ms) \n", timer1 );
cutilCheckError(cutStartTimer(timer));
/* Performs operation using plain C code */
simple_sgemm(N, alpha, h_A, h_B, beta, h_C);
h_C_ref = h_C;
// stop and destroy timer
cutilCheckError(cutStopTimer(timer));
timer2 = cutGetTimerValue(timer);
printf("CPU Processing time: %f (ms) for %d elements %f %f \n", timer2-timer1,n2, timer2,timer1);
cutilCheckError(cutStartTimer(timer));
/* Clear last error */
cublasGetError();
/* Performs operation using cublas */
cublasSgemm('n', 'n', N, N, N, alpha, d_A, N, d_B, N, beta, d_C, N);
status = cublasGetError();
if (status != CUBLAS_STATUS_SUCCESS) {
fprintf (stderr, "!!!! kernel execution error.\n");
return EXIT_FAILURE;
}
// stop and destroy timer
cutilCheckError(cutStopTimer(timer));
timer3 = cutGetTimerValue(timer);
printf("GPU Processing time: %f (ms) for %d elements %f %f\n", timer3-timer2,n2,timer3,timer2);
cutilCheckError(cutStartTimer(timer));
No, that doesn’t look right to me.
Have a look at template.cu in the SDK for an example.
jam11
July 15, 2009, 11:43am
5
There is no timing in the template.
Can you show me what is not right in my code?
But what does this phrase means exactly:
"The ‘gputime’ and ‘cputime’ labels specify the actual chip
execution time and the driver execution time (including gputime),"
Form CUDA_Profiler_2.2.txt
My conclusion so far, is that the profiler time is the proper time.
Kernel launches (and CUBLAS calls) are asynchronous. What you are measuring is only the kernel launch times, not the run times. You should add a cudaThreadSynchronize() call before you stop and read out the timer. This will make the host CPU sit in a spinlock until the kernel is finshing running.
jam11
July 15, 2009, 12:31pm
7
Thank you very much, now it all makes sense
simpleCUBLAS test running…
Setup Processing time: 184.990005 (ms)
CPU Processing time: 19273.943588 (ms) for 1048576 elements 19458.933594 184.990005
GPU Processing time: 19.585938 (ms) for 1048576 elements 19478.519531 19458.933594
Test PASSED
Results Processing time: 9.779297 (ms) 19488.298828 19478.519531
Press ENTER to exit…
CUDA_PROFILE_LOG_VERSION 1.4
CUDA_DEVICE 0 GeForce 9600 GSO
method,gputime,cputime,occupancy
method=[ memcopy ] gputime=[ 1099.488 ] cputime=[ 2493.000 ]
method=[ memcopy ] gputime=[ 1795.968 ] cputime=[ 2459.000 ]
method=[ memcopy ] gputime=[ 1799.072 ] cputime=[ 2435.000 ]
method=[ _Z7sgemmNNPKfiS0_iPfiiff ] gputime=[ 19248.512 ] cputime=[ 25.000 ] occupancy=[ 0.333 ]
method=[ memcopy ] gputime=[ 1874.912 ] cputime=[ 5588.000 ]
~
There is in mine:
unsigned int timer = 0;
cutilCheckError( cutCreateTimer( &timer));
cutilCheckError( cutStartTimer( timer));
(do work)
cutilCheckError( cutStopTimer( timer));
printf( "Processing time: %f (ms)\n", cutGetTimerValue( timer));
cutilCheckError( cutDeleteTimer( timer));