Description
A clear and concise description of the bug or issue.
Environment
TensorRT Version: 8.4.1
GPU Type: Jetson Orin
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version:
The network I’m testing is designed to have multiple sections where one output goes into the input of multiple nodes.
I quantized the network with pytorch using pytorch_quantization (https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html) and export it to ONNX.
When the ONNX was built with the engine using tensorrt, all the other sections became quantization fusion, but when one output becomes the input of multiple nodes, the Q-DQ does not disappear, as shown in the red square in the figure below, and does not become fusion.
Is it normal for TensorRT to not quantize fusion when one output becomes multiple inputs? I’m asking because the guide doesn’t mention anything about it.
