I have an int8 onnx model which has quantiz node and dequantize node, then I deploy it by onnxruntime and tensorRT, but I met a confusing output difference between them.
There are two outputs during the model inference, I marked these output nodes with 434 and 448.
If I only mark one of them, for example, only 434 as my output node and ignoring 448, there is barely gap between onnxruntime-model-output and TensorRT-model-output. The same is 448 as output node and 434 ignored.
However, if I try to get them both, I only get correct values in 434, while the values of 448 between the two models are totally different.
More confusing, if I mark 448’s last convolutional output from previous layer as Node446, which means 434/446/448 are all my outputs, I get all correct results among the three nodes.
I don’t know why this happened.
TensorRT Version: 184.108.40.206
GPU Type: V100
Nvidia Driver Version: 450.142.00
CUDA Version: 11.0
Operating System + Version:
Python Version (if applicable): 3.8.13
PyTorch Version (if applicable): 1.10.2
debug.zip (18.0 MB)
This debug.zip contains scripts, dummy_input and onnx model to reproduce the above phenomenon
the output may like
outputs nodes: ['434'] 434 l1 loss between onnx and trt: 1.0309417120879516e-05 outputs nodes: ['448'] 448 l1 loss between onnx and trt: 9.65373601502506e-06 outputs nodes: ['434', '448'] 434 l1 loss between onnx and trt: 1.0309417120879516e-05 448 l1 loss between onnx and trt: 0.14752599596977234 outputs nodes: ['434', '448', '446'] 434 l1 loss between onnx and trt: 1.0309417120879516e-05 446 l1 loss between onnx and trt: 1.5866717149037868e-05 448 l1 loss between onnx and trt: 9.65373601502506e-06