Onnx vs tensorrt different inference result


I have a bigger onnx model that is giving inconsistent inference results between onnx runtime and tensorrt.


TensorRT Version: 7.1.3
GPU Type: TX2
CUDA Version: 10.2.89
CUDNN Version:
Operating System + Version: Jetpack 4.4 (L4T 32.4.3)

Relevant Files

reduced.onnx (62.0 KB)

Steps To Reproduce

polygraphy debug reduce bigger.onnx -o reduced.onnx --check polygraphy run polygraphy_debug.onnx --onnxrt --trt --trt-outputs mark all --onnx-outputs mark all --fail-fast

I was able to reduce it down to this onnx file.

And running this
polygraphy run reduced.onnx --trt --onnxrt --trt-outputs mark all --onnx-outputs mark all --fail-fast
will fail with message
FAILED | Output: 'model_2/model/block_2_add/add:0' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)

Can you please help to resolve the accuracy difference(I think that’s the problem) to get matching inference result between onnx and tensorrt?

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

No error on check_model.py

Here is the log for running reduce.onnx.
No error running trtexec on the reduced.onnx (nor on the bigger.onnx file, since I have done it many times already. I won’t be able to upload the bigger.onnx file)

trtexec_verbose.log (99.3 KB)

Is it possible for you guys to reproduce the polygraphy run (exceeding default tolerance level)?


Sorry for the delayed response.
We recommend you to please use the latest TensorRT version 8.5.

We think the polygraphy issue is just a tolerance issue. With –atol 1e-4 --rtol 1e-4, the polygraphy check could get passed, while the default value is –atol 1e-5 --rtol 1e-5.
We think 1e-4 is a fairly reasonable tolerance.

Are you looking for 100% numerical accuracy in comparison with ONNX-Runtime results?
Or do you have a benchmarking metric for inference accuracy?

Thank you.