Description
I am converting a trained BERT-style transformer, trained with a multi-task objective, to ONNX (successfully) and then using the ONNXParser in TensorRT (8.2.5) on Nvidia T4, to build an engine (using Python API). Running Inference gives me an output but the outputs are all (varied in exact value) close to 2e-45. The output shape (1x512, …) * 6 is correct but the values in 4/6 (where the output is integer valued) is being given as very small decimal numbers. This happens on both FP16 as well as FP32. Finally, if I use the TensorRT Backend in ONNXRuntime, I get correct outputs.
Environment
TensorRT Version 8.2.5:
GPU Type: Nvidia T4:
Nvidia Driver Version: Latest (450.142.00):
CUDA Version 10.2:
CUDNN Version 7.6:
Operating System + Version:
Python Version (if applicable) 3.8.10:
TensorFlow Version (if applicable):
PyTorch Version (if applicable) 1.11:
Baremetal or Container AWS g4dn24xlarge, AWS Deep Learning AMI 2:
NOTE: The model is a BERT style Transformer Encoder with 1 input embeddings layer, 12 Transformer Layers, 12 attention heads, 768 hidden layer size, and 3 classification heads (768 x 4, 768 x 100k, 768x45)
Relevant Files
Attached below is a sample of the output. It is the output of 6 tensors.
[tensor([[[2.8026e-45, 4.2039e-45, 2.8026e-45, …, 2.8026e-45,
2.8026e-45, 2.8026e-45]],
tensor([[[0.6362, 0.5518, 0.4241, …, 0.4971, 0.3567, 0.6465]],
...,
tensor([[[1.2612e-44, 1.2612e-44, 1.4013e-44, …, 2.1019e-44,
1.6816e-44, 2.1019e-44]],
tensor([[[0.0162, 0.0495, 0.0873, …, 0.6802, 0.0134, 0.0517]],
tensor([[[1.9689e-40, 5.4349e-40, 3.5303e-40, …, 2.0048e-40,
1.7215e-40, 2.8329e-40]],
tensor([[[1., 0., 0., …, 0., 0., 0.]],