Run to run variation with TensorRT

Description: Run to run variation with TensorRT

Environment:
NVIDIA Release: 22.07
NVIDIA TensorRT Version: 8.4.1
NVIDIA Driver Version: 515.43.04
CUDA Version: 11.7
NVIDIA GPU: NVIDIA Tesla T4
Docker Image: nvcr.io/nvidia/tensorrt:22.07-py3

To get the inference data using trtexec. there are two steps involved.

  1. Build a TRT engines from a model
  2. Get a inference performance metrics by loading TRT engines

I see less than 1% variation once TRT engines are built from a model and perform inference stage multiple times.
If I perform step 1 and step 2 multiple times for the same model with same configs, I see variation up to 3% in inference throughput. Is it normal?

If I build a TRT engine for the same model multiple times (with same configs) then should trtexec generate a TRT engine with same size?

Hi,

The builder times kernels to find the fastest, and sometimes if the timings are close between two different precisions, due to timing noise the builder may choose differently on different runs. So 3% variation in runtime is not necessarily unusual, and possibly some variation in engine size.

Thank you.