TensorRT encountered issues when converting weights between types and that could affect accuracy

Description

I use TensorRT to infer BEIT onnx model (opset 17),fp32 is OK on my GPU,but fp16 result is wrong. When i convert TensorRT model through onnn2trt C++ for fp16,it comes these warning:

[TRT] Warning: TensorRT encountered issues when converting weights between types and that could affect accuracy.
[TRT] Warning: - 73 weights are affected by this issue: Detected subnormal FP16 values.
[TRT] Warning: - 47 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.

Actually my own engine infer fp16 BEIT, accuracy is in Reasonable scope.
So I wondered is this incorrect in TensorRT fp16 related with the warnings? or it has some overflow in some optimized kernel like: fusion layers?
A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.5.1.7
GPU Type: RTX 4070 laptop
Nvidia Driver Version: 536.25
CUDA Version: 11.8
CUDNN Version: 8.9.1
Operating System + Version: Windows 11
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Model: microsoft/beit-base-patch16-224-pt22k-ft22k · Hugging Face
onnx2trt verbose log:

verbose.txt (11.9 MB)

Hi,
UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try ONNX parser.
Please check the below link for the same.

Thanks!

Hi @AakankshaS ,Thanks for your reply.
I do use ONNX-TensorRT,fetch code in my CMAKE is like below:

                set(ONNX_TENSORRT_REPO https://github.com/onnx/onnx-tensorrt.git)
                set(ONNX_TENSORRT_TAG release/${TensorRT_VERSION_MAJOR}.${TensorRT_VERSION_MINOR}-GA)
                # fetch onnx-tensorrt
                FetchContent_Declare(
                        onnx2trt
                        GIT_REPOSITORY ${ONNX_TENSORRT_REPO}
                        GIT_TAG ${ONNX_TENSORRT_TAG}
                )

so I think it’s not about converter, it’s about type cast in TensorRT impl. (fp32 is OK)

Hi,

We recommend that you please try on the latest TensorRT version 8.6.1.
If you still face the same issue, please share the issue repro ONNX model and complete verbose logs for better debugging.

Thank you.

Hi @spolisetty , thank you for your reply.

unfortunately, I tried on 8.6.1.6, still wrong when using fp16.

the complete verbose logs is here:
TRT860_FP16.txt (11.3 MB)

The onnx model is: microsoft/beit-base-patch16-224-pt22k-ft22k · Hugging Face

and you can use these codes to export onnx model:

import torch
from transformers import AutoModel, AutoTokenizer, BeitImageProcessor, BeitForImageClassification
from PIL import Image
import requests
import numpy as np
import onnxruntime

processor = BeitImageProcessor.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
model = BeitForImageClassification.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')

dummy_input = torch.ones((1,3,640,480))
# export
torch.onnx.export(
    model.eval(), 
    dummy_input,
    f="${converted_onnx_model_path}",  
    input_names=['pixel_values'], 
    output_names=['last_hidden_state'], 
    do_constant_folding=False, 
    opset_version=17, 
)

@spolisetty
Plus, Is there any possible related API to help us trade-off accuracy and performance?Cause it is so so so hard to debug when there comes accuracy error.

for example:
As my imagination, could I do like step3~step5?

//1. ..... some init & config set

// 2. build model
nvinfer1::IHostMemory *serializedModel = builder->buildSerializedNetwork(*network, *config);

// 3. then I want to get the warning log about affected fp16  weights layers in buildSerializedNetwork.
... some API called

// 4. according to the warning I get in step 3, then I changed affected fp16  weights layers to fp32.
... some API called

// 5. finally, rebuild TRT model.
.... some API called

Hi,

We could reproduce the accuracy drop for FP16 precision. Please allow us some time work on this issue.

Thank you.

Hi,

[I]             Relative Difference | Stats: mean=0.048545, std-dev=1.6032, var=2.5703, median=0.0060978, min=3.0518e-07 at (0, 12550), max=175.59 at (0, 14473), avg-magnitude=0.048545
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (3.05e-07, 17.6) |      21839 | ########################################
                    (17.6    , 35.1) |          0 |
                    (35.1    , 52.7) |          0 |
                    (52.7    , 70.2) |          0 |
                    (70.2    , 87.8) |          0 |
                    (87.8    , 105 ) |          0 |
                    (105     , 123 ) |          0 |
                    (123     , 140 ) |          0 |
                    (140     , 158 ) |          1 |
                    (158     , 176 ) |          1 |
[E]         FAILED | Output: 'last_hidden_state' | Difference exceeds tolerance (rel=0.001, abs=0.001)
[E]     FAILED | Mismatched outputs: ['last_hidden_state']
[E] Accuracy Summary | trt-runner-N0-09/21/23-05:58:57 vs. onnxrt-runner-N0-09/21/23-05:58:57 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 112.622s | Command: /usr/local/bin/polygraphy run model.onnx --trt --onnxrt --verbose --atol 0.001 --rtol 0.001 --fp16

We think this level of difference is normal for FP16 networks. If you face any accuracy issues with FP16 in your real application, could you please provide minimal issue repro steps and scripts for better debugging?

Thank you.