TensorRT Latency measurement


In our TensorRT application we have measured Execution Time which covers Input transfer,GPU Processing and Output trasfer . From few references of Nvidia links and Blogs we understood that inference time actually refers latency time.

Could you please guide us on how to measure Latency and system throughput w.r.t our application ?



You can check our GoogleNet sample for information:

The reportLayerTime() function is called once per layer.
Only the inference time of a layer is calculated, and doesn’t include the input data preparing time. (ex. cudaMemCpy…)

Calculating latency, please sum-up the execution time of pre-process, inference and post-process.