Some questions about TensorRT INT8, PTQ and QAT

lingchao.zhu · December 21, 2020, 9:10am

Description

I’m working for TensorRT INT8 inference.
Refer to " GTC 2020: Integer Quantization for DNN Inference Acceleration | NVIDIA Developer" , the PTQ performance is good. But when I try this calibration, the result is too worse. And the IInt8EntropyCalibrator is also worse.

Some information about my test.

Data set is COCO.
Network type is PoseEstimation

Some questions about PTQ.

Is there some way to adjust the max or min ,like, 99.9%, 99.999%, of IInt8MinMaxCalibrator?

Because of the low precision of PTQ, I also test the PTQ. But there are some problems when I convert the Tensorflow1.0 model to TensorRT following the guide( GitHub - NVIDIA/sampleQAT: Inference of quantization aware trained networks using TensorRT).

Some questions about QAT

I generated two ONNX QAT model,

and .

The left is a part of a large model, and the right is generate from scratch. It look there is no different except for the names. The right can be converted to TensorRT successfully, but the left is return an error, “[TensorRT] ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/safeWeightsPtr.h (102) - Assertion Error in setCount: 0 (count >= 0)”.

Is there more details about those errors and QAT of TensorRT?
There are some other problems like check sanity error , segmentation fault directly

Environment

TensorRT Version: 7.2.1.6 (Python)
GPU Type: 2070super
Nvidia Driver Version: 455.45.01
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 1.15.4
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorflow:20.11-tf1-py3

Relevant Files

https://github.com/lc-zhu/SharedFiles.git

Steps To Reproduce

Please include:

Exact steps/commands to build your repro

Crete Left model

# 1. Create Left model:
python3 yolov3.py
# 2. Convert to ONNX
python3 -m tf2onnx.convert --input ./frozen_qat_yolov3/frozen_qat_bn1.pb  --output ./frozen_qat_yolov3/frozen_qat_bn1.onnx --inputs "input:0" --outputs "darknet53_body/Relu:0" --opset 10 --fold_const
# 3. Post-processing for removing `transpose`
 python3 postprocess_onnx.py --input ./frozen_qat_yolov3/frozen_qat_bn1.onnx --output ./frozen_qat_yolov3/frozen_qat_bn1_post.onnx
#4. Build engine
python3 build_engine.py --onnx ./frozen_qat_yolov3/frozen_qat_bn1_post.onnx --engine ./frozen_qat_yolov3/frozen_qat_bn1.engine -v

Create right model

# 1. Create Left model:
python3 model.py
# 2. Convert to ONNX
python3 -m tf2onnx.convert --input ./frozen_qat/frozen_qat_bn1.pb  --output ./frozen_qat/frozen_qat_bn1.onnx --inputs "input:0" --outputs "output:0" --opset 10 --fold_const
# 3. Post-processing for removing `transpose`
python3 postprocess_onnx.py --input ./frozen_qat/frozen_qat_bn1.onnx --output ./frozen_qat/frozen_qat_bn1_post.onnx   
# 4. Build engine
python3 build_engine.py --onnx ./frozen_qat/frozen_qat_bn1_post.onnx --engine ./frozen_qat/frozen_qat_bn1.engine -v

Full traceback of errors encountered

[TensorRT] ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/safeWeightsPtr.h (102) - Assertion Error in setCount: 0 (count >= 0)

AakankshaS · December 28, 2020, 6:42am

Hi @lingchao.zhu,
Please allow me some time to check on this

Thanks!

lingchao.zhu · January 19, 2021, 1:50am

Hi @AakankshaS,

Is there any progress?

AakankshaS · August 27, 2021, 11:11am

Hi @lingchao.zhu ,
Are you still facing the issue?
Can you please share the files with us in case if the issue persist.
Thanks!

lingchao.zhu · September 1, 2021, 1:23am

Hi,

We now use TensorRT8.0. And it works well.

yeqiting · December 27, 2021, 4:45pm

hi、这个量化感知可以教教吗。。。就是tensorrt8里面也需要自己提前做 bn的融合吗？

Topic		Replies	Views
Converting to TRT a model from Quantization Aware Training without applying calibration TensorRT	5	1851	February 2, 2021
Tensorrt inferencing getting failed with custom quantized int 8 TensorFlow model TensorRT tensorrt , ubuntu , python , cudnn	1	76	March 28, 2025
The result of tensorrt qat is not equal to the result of pytorch qat in int8 mode TensorRT	5	559	June 5, 2020
Can TensorRT 7.1.3 convert an INT8 pytorch QAT model to engine? TensorRT	3	806	April 21, 2022
Trtexec cannot convert QAT onnx model to trt model Jetson AGX Xavier tensorrt	7	786	August 9, 2022
INT8 quantization with Torch-TensorRT fails TensorRT tensorrt , pytorch	3	950	June 29, 2022
TensorRT quantization uses int8 or uint8 TensorRT tensorrt	1	923	June 6, 2023
TensorRT the inference is slow for the QAT model comparing to the PTQ case Jetson AGX Xavier tensorrt , nvbugs	19	1763	January 16, 2023
How to verify if QAT TRT engine is indeed INT8 on Xavier Jetson AGX Xavier tensorrt	16	762	October 5, 2022
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	2056	June 14, 2021