The inference result of Conformer Encoder is wrong

Description

I converted the conformer encoder model from pytorch to ONNX and then to TensorRT. However, I found that using the resulting TensorRT model inference results were wrong.

In order to figure out where the error occurred, I printed out the intermediate results, and finally located the location of the error. It’s operator like this in pytorch: y = x.eq(0.0), where x is type of torch.float32, and y is type of torch.bool which will be mask of torch.masked_fill() to limit the context window of the attention.

By the way, this error appears in TensorRT 8.0.1.6 but not in TensorRT 8.2.1.8. And TensorRT 8.0.1.6 is about 35% faster than TensorRT 8.2.1.8 when using the resulting TensorRT model.

Environment

TensorRT Version: 8.0.1.6
GPU Type: V100
Nvidia Driver Version: 455.32.00
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System + Version: CentOS 7
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.8.1
Baremetal or Container (if container which image + tag):

Relevant Files

I tried to upload the ONNX model which is 233M. The model is too big to upload.

Steps To Reproduce

  1. use torch.onnx.export() convert model from pytorch to ONNX
  2. use trtexec convert model from ONNX to TensorRT
  3. inference with resulting model in Python 3.6

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#error-messaging
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#faq

Thanks!

trtinfer.init.log (22.8 KB)

the command of trtexec:
trtexec --verbose --loadEngine=$trt_model --shapes=$input_shape --loadInputs=$input_spec --exportOutput=trtinfer-$target.json

the results from trtexec are the same with TensorRT from python, and different with ONNX.

There are no errors when convert model from pytorch to ONNX and then to TensorRT, so the ops used in pytorch are supported by TensorRT, right?

I’d like to upload the model to you in ONNX, but the model is about 233M. It’s too big to upload!!!

Hi,

Could you please try on the latest TensorRT version 8.4 EA. If you still face this issue we recommend you to please share issue repro model, script to try from our end for better debugging.

Are you not facing this on the latest TRT version? We also recommend you to please verify onnx-runtime results as well to make sure they are correct.

Thank you.

OK,I will try on theTensorRT version 8.4 EA.