I am using trtexec from TensorRT sample to measure the latency of a UFF model using a serialized engine on Jetson Xavier.
After reading the code (and as expected), I can see that the latency which is reported by this sample, is the time it takes for an image to be fed to the network until the output results are ready.
What I want at the moment is the set up time for the serialized engine too. I know this is a one-time cost, but I want to know what is the latency of loading a serialized engine and setting up everything, prior to running the first inference.
Any ideas on how I can do that before reading the whole code and save time?
TensorRT Version: 7.1
GPU Type: 512-Core Volta GPU with Tensor Cores
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: Jetpack 4.4
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):