ONNX to TensorRT conversion (FP16 or FP32) results in integer outputs being mapped to near negative infinity (~2e-45)

Description

I am converting a trained BERT-style transformer, trained with a multi-task objective, to ONNX (successfully) and then using the ONNXParser in TensorRT (8.2.5) on Nvidia T4, to build an engine (using Python API). Running Inference gives me an output but the outputs are all (varied in exact value) close to 2e-45. The output shape (1x512, …) * 6 is correct but the values in 4/6 (where the output is integer valued) is being given as very small decimal numbers. This happens on both FP16 as well as FP32. Finally, if I use the TensorRT Backend in ONNXRuntime, I get correct outputs.

Environment

TensorRT Version 8.2.5:
GPU Type: Nvidia T4:
Nvidia Driver Version: Latest (450.142.00):
CUDA Version 10.2:
CUDNN Version 7.6:
Operating System + Version:
Python Version (if applicable) 3.8.10:
TensorFlow Version (if applicable):
PyTorch Version (if applicable) 1.11:
Baremetal or Container AWS g4dn24xlarge, AWS Deep Learning AMI 2:

NOTE: The model is a BERT style Transformer Encoder with 1 input embeddings layer, 12 Transformer Layers, 12 attention heads, 768 hidden layer size, and 3 classification heads (768 x 4, 768 x 100k, 768x45)

Relevant Files

Attached below is a sample of the output. It is the output of 6 tensors.

[tensor([[[2.8026e-45, 4.2039e-45, 2.8026e-45, …, 2.8026e-45,
2.8026e-45, 2.8026e-45]],

tensor([[[0.6362, 0.5518, 0.4241, …, 0.4971, 0.3567, 0.6465]],

    ...,

tensor([[[1.2612e-44, 1.2612e-44, 1.4013e-44, …, 2.1019e-44,
1.6816e-44, 2.1019e-44]],

tensor([[[0.0162, 0.0495, 0.0873, …, 0.6802, 0.0134, 0.0517]],

tensor([[[1.9689e-40, 5.4349e-40, 3.5303e-40, …, 2.0048e-40,
1.7215e-40, 2.8329e-40]],

tensor([[[1., 0., 0., …, 0., 0., 0.]],

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

So I have run the model fine using onnxruntime with tensorrt execution provider. In case you didn’t see this, my model compiles and runs fine. The issue is with the outputs, and that they are totally wrong (when using trtexec or tensorrt python api (creating and running an engine using onnx parser).I’ve attached a jupyter notebook showing how I can use it in both onnxruntime as well as tensorrt but the results are different. additionally I’ve attached an onnx file as drive link since its 1.7 gb. m27_no_sentindex_amp_fp16_dim2_dynamic_batch.onnx - Google Drive tensorrt_run.ipynb (2.8 MB)
Again, it RUNS fine, the RESULTS are just off (all of them are like the sample provided in OP)

Hi,

We couldn’t run successfully the above script. We recommend you to please share with us the minimal issue repro script for better debugging.
Also, you can explore BERT samples here and make sure your script is correct.

Thank you.