I converted the conformer encoder model from pytorch to ONNX and then to TensorRT. However, I found that using the resulting TensorRT model inference results were wrong.
In order to figure out where the error occurred, I printed out the intermediate results, and finally located the location of the error. It’s operator like this in pytorch: y = x.eq(0.0), where x is type of torch.float32, and y is type of torch.bool which will be mask of torch.masked_fill() to limit the context window of the attention.
By the way, this error appears in TensorRT 184.108.40.206 but not in TensorRT 220.127.116.11. And TensorRT 18.104.22.168 is about 35% faster than TensorRT 22.214.171.124 when using the resulting TensorRT model.
TensorRT Version: 126.96.36.199 GPU Type: V100 Nvidia Driver Version: 455.32.00 CUDA Version: 11.1 CUDNN Version: 8.0.4 Operating System + Version: CentOS 7 Python Version (if applicable): 3.6 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.8.1 Baremetal or Container (if container which image + tag):
I tried to upload the ONNX model which is 233M. The model is too big to upload.
Steps To Reproduce
use torch.onnx.export() convert model from pytorch to ONNX