TensorRT encountered issues when converting weights between types and that could affect accuracy

fdyCN · August 22, 2023, 6:06am

Description

I use TensorRT to infer BEIT onnx model (opset 17)，fp32 is OK on my GPU，but fp16 result is wrong. When i convert TensorRT model through onnn2trt C++ for fp16，it comes these warning：

[TRT] Warning: TensorRT encountered issues when converting weights between types and that could affect accuracy.
[TRT] Warning: - 73 weights are affected by this issue: Detected subnormal FP16 values.
[TRT] Warning: - 47 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.

Actually my own engine infer fp16 BEIT, accuracy is in Reasonable scope.
So I wondered is this incorrect in TensorRT fp16 related with the warnings? or it has some overflow in some optimized kernel like: fusion layers？
A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.5.1.7
GPU Type: RTX 4070 laptop
Nvidia Driver Version: 536.25
CUDA Version: 11.8
CUDNN Version: 8.9.1
Operating System + Version: Windows 11
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Model: microsoft/beit-base-patch16-224-pt22k-ft22k · Hugging Face
onnx2trt verbose log:

verbose.txt (11.9 MB)

AakankshaS · August 22, 2023, 8:07am

Hi,
UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try ONNX parser.
Please check the below link for the same.

Thanks!

fdyCN · August 22, 2023, 9:23am

Hi @AakankshaS ，Thanks for your reply.
I do use ONNX-TensorRT，fetch code in my CMAKE is like below:

                set(ONNX_TENSORRT_REPO https://github.com/onnx/onnx-tensorrt.git)
                set(ONNX_TENSORRT_TAG release/${TensorRT_VERSION_MAJOR}.${TensorRT_VERSION_MINOR}-GA)
                # fetch onnx-tensorrt
                FetchContent_Declare(
                        onnx2trt
                        GIT_REPOSITORY ${ONNX_TENSORRT_REPO}
                        GIT_TAG ${ONNX_TENSORRT_TAG}
                )

so I think it’s not about converter, it’s about type cast in TensorRT impl. (fp32 is OK)

spolisetty · August 22, 2023, 10:54am

Hi,

We recommend that you please try on the latest TensorRT version 8.6.1.
If you still face the same issue, please share the issue repro ONNX model and complete verbose logs for better debugging.

Thank you.

fdyCN · August 22, 2023, 11:21am

Hi @spolisetty , thank you for your reply.

unfortunately, I tried on 8.6.1.6, still wrong when using fp16.

the complete verbose logs is here:
TRT860_FP16.txt (11.3 MB)

The onnx model is: microsoft/beit-base-patch16-224-pt22k-ft22k · Hugging Face

and you can use these codes to export onnx model:

import torch
from transformers import AutoModel, AutoTokenizer, BeitImageProcessor, BeitForImageClassification
from PIL import Image
import requests
import numpy as np
import onnxruntime

processor = BeitImageProcessor.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
model = BeitForImageClassification.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')

dummy_input = torch.ones((1,3,640,480))
# export
torch.onnx.export(
    model.eval(), 
    dummy_input,
    f="${converted_onnx_model_path}",  
    input_names=['pixel_values'], 
    output_names=['last_hidden_state'], 
    do_constant_folding=False, 
    opset_version=17, 
)

fdyCN · August 22, 2023, 11:23am

@spolisetty
Plus, Is there any possible related API to help us trade-off accuracy and performance？Cause it is so so so hard to debug when there comes accuracy error.

for example:
As my imagination, could I do like step3~step5？

//1. ..... some init & config set

// 2. build model
nvinfer1::IHostMemory *serializedModel = builder->buildSerializedNetwork(*network, *config);

// 3. then I want to get the warning log about affected fp16  weights layers in buildSerializedNetwork.
... some API called

// 4. according to the warning I get in step 3, then I changed affected fp16  weights layers to fp32.
... some API called

// 5. finally, rebuild TRT model.
.... some API called

spolisetty · September 21, 2023, 6:09am

Hi,

We could reproduce the accuracy drop for FP16 precision. Please allow us some time work on this issue.

Thank you.

spolisetty · September 22, 2023, 10:23am

Hi,

[I]             Relative Difference | Stats: mean=0.048545, std-dev=1.6032, var=2.5703, median=0.0060978, min=3.0518e-07 at (0, 12550), max=175.59 at (0, 14473), avg-magnitude=0.048545
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (3.05e-07, 17.6) |      21839 | ########################################
                    (17.6    , 35.1) |          0 |
                    (35.1    , 52.7) |          0 |
                    (52.7    , 70.2) |          0 |
                    (70.2    , 87.8) |          0 |
                    (87.8    , 105 ) |          0 |
                    (105     , 123 ) |          0 |
                    (123     , 140 ) |          0 |
                    (140     , 158 ) |          1 |
                    (158     , 176 ) |          1 |
[E]         FAILED | Output: 'last_hidden_state' | Difference exceeds tolerance (rel=0.001, abs=0.001)
[E]     FAILED | Mismatched outputs: ['last_hidden_state']
[E] Accuracy Summary | trt-runner-N0-09/21/23-05:58:57 vs. onnxrt-runner-N0-09/21/23-05:58:57 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 112.622s | Command: /usr/local/bin/polygraphy run model.onnx --trt --onnxrt --verbose --atol 0.001 --rtol 0.001 --fp16

We think this level of difference is normal for FP16 networks. If you face any accuracy issues with FP16 in your real application, could you please provide minimal issue repro steps and scripts for better debugging?

Thank you.

Topic		Replies	Views
Meet some problem with --precisionConstraints=obey --layerPrecisions TensorRT	1	1422	January 17, 2023
Subnormal FP16 values detected TensorRT tensorrt	6	10092	September 8, 2022
Convert the TRT model with FP16 Jetson TX2 jetpack , tensorrt , jetson-inference	7	2493	October 18, 2021
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	1908	June 14, 2021
Pytorch -> ONNX -> TensorRT inference with terrible accuracy (int64 clamped to int32) TensorRT cudnn	2	1425	January 23, 2024
Converted model is broken if half precision with dynamic batch size and batch size is greater than 1 TensorRT	11	2460	October 18, 2024
Tensorrt loss accuracy when test TensorRT tensorrt	6	2165	February 24, 2022
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1614	September 28, 2023
Inswapper onnx model conversion to tensorrt model Jetson AGX Orin tensorrt , onnx	29	1123	January 8, 2025
TensorRT's OnnxParser problem TensorRT tensorrt	6	2353	October 12, 2021

TensorRT encountered issues when converting weights between types and that could affect accuracy

Description

Environment

Relevant Files

Related topics