TensorRT quantization Optimization

soohyung.zhang · November 2, 2023, 9:24am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.4.1
GPU Type: Jetson Orin
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version:

The network I’m testing is designed to have multiple sections where one output goes into the input of multiple nodes.
I quantized the network with pytorch using pytorch_quantization (https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html) and export it to ONNX.
When the ONNX was built with the engine using tensorrt, all the other sections became quantization fusion, but when one output becomes the input of multiple nodes, the Q-DQ does not disappear, as shown in the red square in the figure below, and does not become fusion.
Is it normal for TensorRT to not quantize fusion when one output becomes multiple inputs? I’m asking because the guide doesn’t mention anything about it.

AakankshaS · November 29, 2023, 5:41am

Hi @soohyung.zhang ,
Apologies for delayed response,
Can you please share with us the onnx model and the reproducible script and steps so that we can help you better.

Thanks

Topic		Replies	Views
TensorRT explicit quantization layer fusion TensorRT tensorrt	4	1190	May 3, 2022
Missing quantization data when converting TF2.x QAT => ONNX => TensorRT TensorRT tensorrt	14	2366	October 18, 2021
Where do TensorRT Plugin determine whether fuse qdq or not? TensorRT tensorrt , cuda	8	1312	September 22, 2022
Torch-Quantization Examples for Manual Q/DQ Control TensorRT tensorrt	1	643	August 29, 2023
Fake quant layer of Maxpooling does not fused? TensorRT	3	463	August 29, 2023
Failed to create tensorrt engine from QAT onnx model Jetson AGX Orin tensorrt , onnx	3	1103	January 16, 2023
TensorRT conversion issues of ONNX model trained with Quantization Aware Training + custom quantization scale TensorRT tensorrt	5	1504	April 14, 2021
Model quantized with explicit precision mode (with Q/DQ nodes) failed in engine generation TensorRT	6	1091	December 17, 2022
[8] Assertion failed: ctx->network()->hasExplicitPrecision() && "TensorRT only supports multi-input conv for explicit precision QAT networks!" TensorRT	3	737	May 11, 2021
Internal Error while Creating TensorRT Engine from Quantized ONNX Model Jetson AGX Orin tensorrt , nvbugs , cudnn	7	220	October 9, 2025

TensorRT quantization Optimization

Description

Environment

Related topics