Under the int8 mode, the output of onnxruntime and tensorRT are inconsistent

554876827 · August 19, 2022, 2:52am

Description

I have an int8 onnx model which has quantiz node and dequantize node, then I deploy it by onnxruntime and tensorRT, but I met a confusing output difference between them.
There are two outputs during the model inference, I marked these output nodes with 434 and 448.
If I only mark one of them, for example, only 434 as my output node and ignoring 448, there is barely gap between onnxruntime-model-output and TensorRT-model-output. The same is 448 as output node and 434 ignored.
However, if I try to get them both, I only get correct values in 434, while the values of 448 between the two models are totally different.
More confusing, if I mark 448’s last convolutional output from previous layer as Node446, which means 434/446/448 are all my outputs, I get all correct results among the three nodes.
I don’t know why this happened.

Environment

TensorRT Version: 8.4.1.5
GPU Type: V100
Nvidia Driver Version: 450.142.00
CUDA Version: 11.0
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.8.13
PyTorch Version (if applicable): 1.10.2
onnxruntime-gpu: 1.11.1

Relevant Files

debug.zip (18.0 MB)
This debug.zip contains scripts, dummy_input and onnx model to reproduce the above phenomenon

Steps To Reproduce

python compare_onnx_trt_subgraph.py

the output may like

outputs nodes: ['434']
434  l1 loss between onnx and trt:  1.0309417120879516e-05

outputs nodes: ['448']
448  l1 loss between onnx and trt:  9.65373601502506e-06

outputs nodes: ['434', '448']
434  l1 loss between onnx and trt:  1.0309417120879516e-05
448  l1 loss between onnx and trt:  0.14752599596977234

outputs nodes: ['434', '448', '446']
434  l1 loss between onnx and trt:  1.0309417120879516e-05
446  l1 loss between onnx and trt:  1.5866717149037868e-05
448  l1 loss between onnx and trt:  9.65373601502506e-06

NVES · August 19, 2022, 3:07am

Hi, Please refer to the below links to perform inference in INT8

Thanks!

554876827 · August 19, 2022, 3:25am

I think my code which used to convert the onnx to tensorrt and inference has no bugs. Maybe there are some bugs when tensorrt do operator (quantize and dequantize) fusion. Some optimizations caused the memory used by 448-related nodes to be accidentally modified so that the correct results could not be obtained. It would be greatly appreciated if you could read the code again and give some reasons why the error occurs

554876827 · August 19, 2022, 3:29am

But if I mark 446-node as output, the entire right branch may not participate in the optimization, so the result at this time is as expected
By the way, 446-node is the output node of the conv in the right branch

Topic		Replies	Views
INT8 inference with different results TensorRT	5	1256	October 5, 2018
ONNX runtime result differs from int8 quantized pytorch model TensorRT tensorrt , onnx	5	2121	February 15, 2022
I am using TensorRT int8 inference, but get wrong results. error occurs at the last three layer. TensorRT	1	726	May 23, 2018
Differences in performance between onnx models in Pytorch and TensorRT TensorRT	1	2480	June 17, 2019
Int8 get the same result, but in FP16 the result is correct TensorRT	1	437	December 1, 2021
Builder failed while configuring INT8 mode TensorRT	2	1270	November 28, 2019
TensorRT run ONNX model with Int8 issue TensorRT	9	4380	October 12, 2021
TensorRT 4.0.1 - Int8 precision Vs. FP32 precision objects detections inference results TensorRT	12	3581	December 1, 2019
The result of Int8 model is unstable on TensorRT 3 ~ 4 TensorRT	0	611	June 19, 2019
Output from ONNX inference and trt inference are different Jetson TX2 tensorrt , tensorflow , nvbugs	6	914	October 18, 2021

Under the int8 mode, the output of onnxruntime and tensorRT are inconsistent

Description

Environment

Relevant Files

Steps To Reproduce

Related topics