Description: Run to run variation with TensorRT
Environment:
NVIDIA Release: 22.07
NVIDIA TensorRT Version: 8.4.1
NVIDIA Driver Version: 515.43.04
CUDA Version: 11.7
NVIDIA GPU: NVIDIA Tesla T4
Docker Image: nvcr.io/nvidia/tensorrt:22.07-py3
To get the inference data using trtexec. there are two steps involved.
- Build a TRT engines from a model
- Get a inference performance metrics by loading TRT engines
I see less than 1% variation once TRT engines are built from a model and perform inference stage multiple times.
If I perform step 1 and step 2 multiple times for the same model with same configs, I see variation up to 3% in inference throughput. Is it normal?
If I build a TRT engine for the same model multiple times (with same configs) then should trtexec generate a TRT engine with same size?