I have exported a PyTorch model to ONNX and the output matches, which means the ONNX model seems to be working as expected. However, after generating Tensorrt Engine from this ONNX file the outputs are different.
Environment
TensorRT Version: 7.2.3.4 GPU Type: GTX 1650 - 4GB Nvidia Driver Version: 465.19.01 CUDA Version: 11.3 Operating System + Version: Ubuntu 18.04 Python Version (if applicable): 3.8.5 PyTorch Version (if applicable): 1.9.0 Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:21.05-py3
Loading ONNX file: 'lcc.onnx'
[TensorRT] WARNING: /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:227: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Completed parsing of ONNX file
converting to fp16
Building an Engine...
Completed creating Engine
Elapsed: 40.106 sec
[[2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0. 0.
0. 0. 0. 0. ]
[2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0. 0.
0. 0. 0. 0. ]
[2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0. 0.
0. 0. 0. 0. ]
[2.0784717 2.0784717 2.0784717 2.0784717 2.0784717 0. 0.
0. 0. 0. 0. ]
[2.0784717 2.0784717 2.0784717 2.0784717 2.0784717 0. 0.
0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]]
As you can see the outputs are completely different, One more strange behaviour I noticed is, Tensorrt engine almost gives same output for different input images. Please help in giving any pointers or help me debug the issue.
Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command. https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
In our suggestion it’s to not expected to have such high level of matching against ONNX-Runtime or any two implementations of a DL model - whether on CPU, GPU, or a mix.
TensorRT provides no way to achieve this.
DL networks are typically robust against changes in the order of FP operations.
But please do let me know if that impacting the accuracy in your case.
Indeed I had ported multiple DL models to Tensorrt but this is the first time I am encountering this kind of issue,
Yes, this not only impacts the accuracy but for any image input I am getting similar output i.,e the post-processing result is the same.
Thanks
I tried the latest ngc container nvcr.io/nvidia/tensorrt:21.09-py3 which has tensorrt==8.0.3.0
It gives th same output, moreover after first inference I see this output Segmentation fault (core dumped)
Hi @spolisetty,
I tried running the same ONNX model via Tensorrt 8.2
As I mentioned in the previous replies, there is no error but the output is not as I expected.
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
It seems like tensorrt is having difficulty in converting these methods, as per the docs we can overcome these by writing a converter to override unsupported methods.
We have a similar known issue. I believe its fixed in TensorRT version 8.2 GA update 1. Its released recently.
We request you to please verify one last time on the above version. If you still face this issue, please let us know, this will be fixed in future releases.
Got similar issue and I tried all the above. When inferencing trough detectoron 2 transformers for panoptic segmentation, converted in a tensorrt serialized plan, everything works fine until output generation. it is different from the onnx output and changing input doesn’t affect the output that remains the same.