I don’t understand the difference between the --avgRuns and --iterations options when using trtexec. I used trtexec on my JetsonTX2 as a benchmark, but the latency results I got from setting --avgRuns=100 with 1 iteration and setting --iterations=100 with 1 run are quite different. The latency numbers I got by setting --iterations=100 are higher in average than the number I got by setting --avgRuns=100. From the --help option, it seems that these 2 options are very similar, only that --avgRuns will report a 99 percentile. What’s the purpose of having an average number and an iteration number seperately?
Also, is there a way to view all the raw latency data, not just the average and 99 percentile?
You can find the trtexec source code in /usr/src/tensorrt/samples/trtexec/.
Please check the following profiling sample:
for (int j = 0; j < <b>gParams.iterations</b>; j++)
for (int i = 0; i < <b>gParams.avgRuns</b>; i++)
auto tStart = std::chrono::high_resolution_clock::now();
// run TRT
auto tEnd = std::chrono::high_resolution_clock::now();
totalHost += std::chrono::duration<float, std::milli>(tEnd - tStart).count();
cudaEventElapsedTime(&ms, start, end);
times[i] = ms;
totalGpu += ms;
totalGpu /= gParams.avgRuns;
totalHost /= gParams.avgRuns;
gLogInfo << "Average over " << gParams.avgRuns << " runs is " << totalGpu << " ms (host walltime is " << totalHost
<< " ms, " << static_cast<int>(gParams.pct) << "\% percentile time is " << percentile(gParams.pct, times) << ")." << std::endl;
Average run indicate how many time of inference to calculate an average profiling data.
In general, it’s expected that first launch time will be longer so the result for avgRuns=1 and avgRuns=100 might be different.
And iteration is the parameter for how many this profiling test to be executed.
As the source code is available, you can update the profiling function to your requirement directly.
Thanks, knowing the source code helps a lot.