Description
I have an int8 onnx model which has quantiz node and dequantize node, then I deploy it by onnxruntime and tensorRT, but I met a confusing output difference between them.
There are two outputs during the model inference, I marked these output nodes with 434 and 448.
If I only mark one of them, for example, only 434 as my output node and ignoring 448, there is barely gap between onnxruntime-model-output and TensorRT-model-output. The same is 448 as output node and 434 ignored.
However, if I try to get them both, I only get correct values in 434, while the values of 448 between the two models are totally different.
More confusing, if I mark 448’s last convolutional output from previous layer as Node446, which means 434/446/448 are all my outputs, I get all correct results among the three nodes.
I don’t know why this happened.
Environment
TensorRT Version: 8.4.1.5
GPU Type: V100
Nvidia Driver Version: 450.142.00
CUDA Version: 11.0
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.8.13
PyTorch Version (if applicable): 1.10.2
onnxruntime-gpu: 1.11.1
Relevant Files
debug.zip (18.0 MB)
This debug.zip contains scripts, dummy_input and onnx model to reproduce the above phenomenon
Steps To Reproduce
python compare_onnx_trt_subgraph.py
the output may like
outputs nodes: ['434']
434 l1 loss between onnx and trt: 1.0309417120879516e-05
outputs nodes: ['448']
448 l1 loss between onnx and trt: 9.65373601502506e-06
outputs nodes: ['434', '448']
434 l1 loss between onnx and trt: 1.0309417120879516e-05
448 l1 loss between onnx and trt: 0.14752599596977234
outputs nodes: ['434', '448', '446']
434 l1 loss between onnx and trt: 1.0309417120879516e-05
446 l1 loss between onnx and trt: 1.5866717149037868e-05
448 l1 loss between onnx and trt: 9.65373601502506e-06