Conv inference problem fp32 to fp16

653576489 · September 11, 2023, 8:07am

Hi, my input data and weights are in the fp16 data format during the convolution inference process. Will fp32 occur during the convolution calculation process? How is it intercepted from fp32 to fp16?

As shown in the figure, I used the profile tool to see that the conv operation calls these kernel functions. Does this mean that the convolution operation of fp16 involves truncation from fp32 to fp16? How is it truncated? In which computing node does it occur? , is it accumulated and finally truncated to fp16?Looking forward to your answer，thank you!

AakankshaS · September 11, 2023, 12:37pm

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.6 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

Topic		Replies	Views
How to reduce differences in inference output across gpus when using fp16? TensorRT cudnn	0	338	December 22, 2023
Inference fp16 engine in c++ get Nan output but inference fp32 engine can get correct result TensorRT	13	1253	October 10, 2023
Padding and speedup of tensorrt inference TensorRT	1	363	August 24, 2021
Can convert to INT32 but not with FP16 TensorRT	3	1023	November 29, 2022
The inference result of Conformer Encoder is wrong TensorRT	6	1232	March 2, 2022
TensorRT, result error in fp16 TensorRT	1	697	October 19, 2021
Inference TensorRT randomly returns nan TensorRT tensorrt	2	522	April 27, 2023
Different FP16 inference with tensorrt and pytorch TensorRT	5	4445	October 25, 2021
Tensorrt Execution Provider TensorRT tensorrt , cudnn , onnx	1	756	November 27, 2023
TensorRT with fp16 return nan for all outputs TensorRT	5	3982	February 5, 2021

Conv inference problem fp32 to fp16

Related topics