Hi, I tried profiling inference timing for googlenet, for batch size of 1 and about 8k images on my Tx1.
I hope to get consistent timing for each input image because I fixed GPU frequency to max 998MHz.
However, the timing fluctuated about 200%.
I believe this is due to inconsistent Tx1 GPU utilization ratio where tegratats shows GR3D fluctuated about 200% too.
When I set batch size to 16, both timing and GPU utilization ratio are consistent.
May I have recommendations on measurement techniques or nvprof metrics for further analysis ? TQ.
Attached are codes, corresponding timing and tegrastats output for 1second.
for (int m = 0; m < total_googlnet_layer; m += batch)
{
copy_from_cpu_to_gpu();
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
<b>cudaEventRecord(start, 0);</b>
<b>forward_googlenet_layer_using_cudnn();</b>
<b> cudaEventRecord(stop, 0);</b>
while( cudaEventQuery(stop) == cudaErrorNotReady ){}
gpuErrchk(cudaEventSynchronize(stop));
cudaDeviceSynchronize();
cudaStreamSynchronize(0);
float time0;
<b>cudaEventElapsedTime(&time0, start, stop); </b>
cudaEventDestroy( start );
cudaEventDestroy( stop );
gpuErrchk( cudaPeekAtLastError() );
}
time forward timing
3:29:26 33.61ms
3:29:26 24.28ms
3:29:26 27.96ms
3:29:26 24.42ms
3:29:26 26.63ms
3:29:26 24.33ms
3:29:26 47.63ms
3:29:26 31.28ms
3:29:26 37.65ms
3:29:26 48.37ms
3:29:26 24.34ms
3:29:26 44.53ms
3:29:26 31.80ms
3:29:26 24.26ms
3:29:26 38.80ms
3:29:26 44.69ms
3:29:26 25.70ms
3:29:26 25.43ms
3:29:26 24.25ms
3:29:26 28.40ms
3:29:26 51.45ms
3:29:26 35.59ms
3:29:26 24.20ms
3:29:26 25.15ms
3:29:26 24.35ms
3:29:26 24.30ms
3:29:26 24.97ms
3:29:26 44.61ms
3:29:26 30.35ms
3:29:26 27.33ms
RAM is consistent at 1607/3853MB
[time] CPU utilization, GPU utilization
[03:29:26] cpu [0%,66%,0%,90%]@1734 EMC GR3D 90%@998
[03:29:26] cpu [10%,69%,16%,83%]@1734 EMC GR3D 85%@998
[03:29:26] cpu [9%,66%,0%,90%]@1734 EMC GR3D 91%@998
[03:29:26] cpu [9%,80%,9%,91%]@1734 EMC GR3D 30%@998
[03:29:26] cpu [10%,75%,0%,100%]@1734 EMC GR3D 65%@998
[03:29:26] cpu [0%,80%,0%,90%]@1734 EMC GR3D 48%@998
[03:29:26] cpu [9%,54%,0%,88%]@1734 EMC GR3D 88%@998
[03:29:26] cpu [0%,72%,9%,90%]@1734 EMC GR3D 43%@998
[03:29:26] cpu [0%,58%,0%,75%]@1734 EMC GR3D 89%@998
[03:29:26] cpu [10%,66%,0%,81%]@1734 EMC GR3D 75%@998