I read the trtexec --help but I would like some precisions about the data collected by trtexec.
In order to manipulate trtexec profiling data I used the following option :
–exportTimes= Write the timing results in a json file (default = disabled)
Then I used the related script to extract data.
From the trace.json I get an array with the following results :
startInMs, endInMs, startComputeMs, endComputeMs, startOutMs, endOutMs, inMs, computeMs, outMs, latencyMs, endToEndMs
“inMs” : time to transfer input data in GPU memory.
“computeMs” : time used by the GPU to calculate one batch.
“endToEndMs” = “inMS”+“ComputeMs”+“outMs”
What is the difference between “computeMs” and “latencyMs” at this point?
Throughput would be equal to computeMs/batch_size.
Latency would be equal to computeMs.
I did not specify “–iterations” options and the default iteration number is “at least” 10.
I did not specify “–avgRun option” and the default measurements are averaged over 10 consecutive iterations.
I get 4065 lines of measure.
To be sure, each line correspond to an averaged measure and trtexec did 4065 iterations?
I also tried to extract a profile.json.
–exportProfile= Write the profile information per layer in a json file (default = disabled)
I work with dynamic shapes so I specified a minShapes, optShapes and maxShapes for building the profile.
trtexec run without error but the profile.json file stays empty. (I checked filename and path)
Also I want to be sure the profiling data was obtained with CUDA GPU profiling and not a CPU absolute clock.