Using polygraphy debug reduce bigger.onnx -o reduced.onnx --check polygraphy run polygraphy_debug.onnx --onnxrt --trt --trt-outputs mark all --onnx-outputs mark all --fail-fast
I was able to reduce it down to this onnx file.
And running this polygraphy run reduced.onnx --trt --onnxrt --trt-outputs mark all --onnx-outputs mark all --fail-fast
will fail with message FAILED | Output: 'model_2/model/block_2_add/add:0' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
Can you please help to resolve the accuracy difference(I think that’s the problem) to get matching inference result between onnx and tensorrt?
Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
Here is the log for running reduce.onnx.
No error running trtexec on the reduced.onnx (nor on the bigger.onnx file, since I have done it many times already. I won’t be able to upload the bigger.onnx file)
Sorry for the delayed response.
We recommend you to please use the latest TensorRT version 8.5.
We think the polygraphy issue is just a tolerance issue. With –atol 1e-4 --rtol 1e-4, the polygraphy check could get passed, while the default value is –atol 1e-5 --rtol 1e-5.
We think 1e-4 is a fairly reasonable tolerance.
Are you looking for 100% numerical accuracy in comparison with ONNX-Runtime results?
Or do you have a benchmarking metric for inference accuracy?