I have a bigger onnx model that is giving inconsistent inference results between onnx runtime and tensorrt.
TensorRT Version: 7.1.3
GPU Type: TX2
CUDA Version: 10.2.89
CUDNN Version: 18.104.22.168
Operating System + Version: Jetpack 4.4 (L4T 32.4.3)
reduced.onnx (62.0 KB)
Steps To Reproduce
polygraphy debug reduce bigger.onnx -o reduced.onnx --check polygraphy run polygraphy_debug.onnx --onnxrt --trt --trt-outputs mark all --onnx-outputs mark all --fail-fast
I was able to reduce it down to this onnx file.
And running this
polygraphy run reduced.onnx --trt --onnxrt --trt-outputs mark all --onnx-outputs mark all --fail-fast
will fail with message
FAILED | Output: 'model_2/model/block_2_add/add:0' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
Can you please help to resolve the accuracy difference(I think that’s the problem) to get matching inference result between onnx and tensorrt?
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
- validating your model with the below snippet
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
No error on check_model.py
Here is the log for running reduce.onnx.
No error running trtexec on the reduced.onnx (nor on the bigger.onnx file, since I have done it many times already. I won’t be able to upload the bigger.onnx file)
trtexec_verbose.log (99.3 KB)
Is it possible for you guys to reproduce the polygraphy run (exceeding default tolerance level)?
Sorry for the delayed response.
We recommend you to please use the latest TensorRT version 8.5.
We think the polygraphy issue is just a tolerance issue. With
–atol 1e-4 --rtol 1e-4, the polygraphy check could get passed, while the default value is
–atol 1e-5 --rtol 1e-5.
1e-4 is a fairly reasonable tolerance.
Are you looking for 100% numerical accuracy in comparison with ONNX-Runtime results?
Or do you have a benchmarking metric for inference accuracy?