Inference fp16 engine in c++ get Nan output but inference fp32 engine can get correct result

MarcW · August 22, 2023, 3:24am

Description

Hi, the TRT team. When i inference a fp16 tensorrt engine in c++, i got the Nan output, but with the same c++ code and just replace the fp16 model with fp32 model, the output was correct. And, both the fp16 model and fp32 model can get correct outputj in python code. So, is there any change to do when i inference fp16 model in c++ code?

i generate the fp16 model with trtexec.exe

trtexec.exe --onnx=best_fp16.onnx --saveEngine=best_fp16.engine --fp16

And i upload my fp32 onnx model named best.onnx, fp32 engine named best.engine, fp16 onnx model named best_fp16.onnx, fp16 engine named best_fp16.engine

Environment

TensorRT Version: 8.2.1.8
GPU Type: RTX3050
Nvidia Driver Version: 528.49
CUDA Version: 11.4
CUDNN Version: 8.2.2.26
Operating System + Version: Windows10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.11.0
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
best.engine (13.2 MB)
best.onnx (7.9 MB)
best_fp16.engine (7.9 MB)
best_fp16.onnx (4.0 MB)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

AakankshaS · August 22, 2023, 8:07am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.6 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

MarcW · August 22, 2023, 8:47am

hi, thanks for your reply.
this is the verbose log(trt.log), for the script i will upload latter.
trt.log (7.4 MB)

MarcW · August 22, 2023, 9:31am

and this is the script.
main_detect.cpp (6.6 KB)
cuda_utils.h (417 Bytes)
post_process.cpp (4.2 KB)
post_process.h (643 Bytes)
preprocess.cu (5.8 KB)
preprocess.h (522 Bytes)

spolisetty · August 22, 2023, 12:09pm

Hi,

We observed an accuracy difference in FP16 precision when we tried to reproduce the issue.
Please allow us some time to work on this issue.

Thank you.

MarcW · August 24, 2023, 12:33am

Hi, spolisetty, AakankshaS
How is it going?

MarcW · August 28, 2023, 8:31am

i update TensorRT to 8.6.1.6, the problem remains.

MarcW · August 28, 2023, 9:35am

And i tested another model yolov5s, it could get correct results in python both in fp32 and fp16, but it got Nan whether in fp32 or fp16. So i guess, maybe, the problem is in my preprocess code?

huwatermelon · August 29, 2023, 3:37am

I had encountered the problem too, I have a tf model, when I use 8.5.1.7 + fp16, all score is ok, but when I update trt to 8.6.1 + fp16, all score nan.

huwatermelon · August 29, 2023, 3:38am

due to I need multi profile + refit, so I had to update to 8.6

huwatermelon · August 30, 2023, 7:50am

My problem solved, onnx + fp32 input for 8.6.1 +fp16 has problem, but it is ok for 8.5.1. So I convert onnx to fp16 before feed to trt, it run correct.

MarcW · August 30, 2023, 8:17am

My onnx was fp16 and after converted it to fp16 engine, the ouput was Nan in c++，but the ouput was correct in python. And i tried TensorRT 8.6.1 and 8.5.1, things didn’t get better. My onnx was from the yolov5/export.py.

MarcW · September 1, 2023, 5:55am

The problem have been resolved. Just use the fp32 onnx model for generating the tensorrt engine, whether for fp32 engine or fp16 engine.

spolisetty · October 10, 2023, 6:03am

When using FP16 to represent absolute values of ~1000, the accuracy loss is quite expected.
Our general advice is to either limit the maximum absolute value of the activations or explicitly set the precision of the affected layers to FP32.