I converted the conformer encoder model from pytorch to ONNX and then to TensorRT. However, I found that using the resulting TensorRT model inference results were wrong.
In order to figure out where the error occurred, I printed out the intermediate results, and finally located the location of the error. It’s operator like this in pytorch: y = x.eq(0.0), where x is type of torch.float32, and y is type of torch.bool which will be mask of torch.masked_fill() to limit the context window of the attention.
By the way, this error appears in TensorRT 8.0.1.6 but not in TensorRT 8.2.1.8. And TensorRT 8.0.1.6 is about 35% faster than TensorRT 8.2.1.8 when using the resulting TensorRT model.
Environment
TensorRT Version: 8.0.1.6 GPU Type: V100 Nvidia Driver Version: 455.32.00 CUDA Version: 11.1 CUDNN Version: 8.0.4 Operating System + Version: CentOS 7 Python Version (if applicable): 3.6 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.8.1 Baremetal or Container (if container which image + tag):
Relevant Files
I tried to upload the ONNX model which is 233M. The model is too big to upload.
Steps To Reproduce
use torch.onnx.export() convert model from pytorch to ONNX
You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation
Also, request you to share your model and script if not shared already so that we can help you better.
Meanwhile, for some common errors and queries please refer to below link:
Could you please try on the latest TensorRT version 8.4 EA. If you still face this issue we recommend you to please share issue repro model, script to try from our end for better debugging.
Are you not facing this on the latest TRT version? We also recommend you to please verify onnx-runtime results as well to make sure they are correct.