When creating a TensorRT engine from an ONNX file, and comparing the inference outputs from the two formats I receive different results (The difference is significant and not due to precision/optimizations).
TensorRT Version: 18.104.22.168
GPU Type: NVIDIA RTX A3000
Nvidia Driver Version: 535.54.03
CUDA Version: 12.2
CUDNN Version: 8.9.2
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): 2.0.1+cu118
Baremetal or Container (if container which image + tag): N/A
- model_640x640_FULL_STAT.onnx - ONNX file
- model_640x640_fp16_stat.engine - TensorRT engine file
- comp_onnx_trt.py - Python script the loads and compare the results for the 2 formats.
- zidane.jpg - file used as input to the networks.
- Build tensorRT engine from ONNX file :
trtexec --onnx=model_640x640_FULL_STAT.onnx --saveEngine=model_640x640_fp16_stat.engine --useSpinWait --fp16
( I also tried without --fp16)
2. run the attached python script (comp_onnx_trt.py): loads the ONNX and trt models, inject the same image file (attached) as input to them, compare the results and print the maximum difference between them.
3, The printed result is : max diff: 6.824637
4. when I compare the ONNX results to Pytorch using the same input, the difference is low (~0.01)