Some questions about TensorRT INT8, PTQ and QAT

Description

I’m working for TensorRT INT8 inference.
Refer to " GTC 2020: Integer Quantization for DNN Inference Acceleration | NVIDIA Developer" , the PTQ performance is good. But when I try this calibration, the result is too worse. And the IInt8EntropyCalibrator is also worse.

Some information about my test.

  • Data set is COCO.
  • Network type is PoseEstimation

Some questions about PTQ.

Is there some way to adjust the max or min ,like, 99.9%, 99.999%, of IInt8MinMaxCalibrator?

Because of the low precision of PTQ, I also test the PTQ. But there are some problems when I convert the Tensorflow1.0 model to TensorRT following the guide( GitHub - NVIDIA/sampleQAT: Inference of quantization aware trained networks using TensorRT).

Some questions about QAT

I generated two ONNX QAT model,

and .

The left is a part of a large model, and the right is generate from scratch. It look there is no different except for the names. The right can be converted to TensorRT successfully, but the left is return an error, “[TensorRT] ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/safeWeightsPtr.h (102) - Assertion Error in setCount: 0 (count >= 0)”.

Is there more details about those errors and QAT of TensorRT?
There are some other problems like check sanity error , segmentation fault directly

Environment

TensorRT Version: 7.2.1.6 (Python)
GPU Type: 2070super
Nvidia Driver Version: 455.45.01
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 1.15.4
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorflow:20.11-tf1-py3

Relevant Files

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  1. Crete Left model
# 1. Create Left model:
python3 yolov3.py
# 2. Convert to ONNX
python3 -m tf2onnx.convert --input ./frozen_qat_yolov3/frozen_qat_bn1.pb  --output ./frozen_qat_yolov3/frozen_qat_bn1.onnx --inputs "input:0" --outputs "darknet53_body/Relu:0" --opset 10 --fold_const
# 3. Post-processing for removing `transpose`
 python3 postprocess_onnx.py --input ./frozen_qat_yolov3/frozen_qat_bn1.onnx --output ./frozen_qat_yolov3/frozen_qat_bn1_post.onnx
#4. Build engine
python3 build_engine.py --onnx ./frozen_qat_yolov3/frozen_qat_bn1_post.onnx --engine ./frozen_qat_yolov3/frozen_qat_bn1.engine -v
  1. Create right model
# 1. Create Left model:
python3 model.py
# 2. Convert to ONNX
python3 -m tf2onnx.convert --input ./frozen_qat/frozen_qat_bn1.pb  --output ./frozen_qat/frozen_qat_bn1.onnx --inputs "input:0" --outputs "output:0" --opset 10 --fold_const
# 3. Post-processing for removing `transpose`
python3 postprocess_onnx.py --input ./frozen_qat/frozen_qat_bn1.onnx --output ./frozen_qat/frozen_qat_bn1_post.onnx   
# 4. Build engine
python3 build_engine.py --onnx ./frozen_qat/frozen_qat_bn1_post.onnx --engine ./frozen_qat/frozen_qat_bn1.engine -v
  • Full traceback of errors encountered

[TensorRT] ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/safeWeightsPtr.h (102) - Assertion Error in setCount: 0 (count >= 0)

Hi @lingchao.zhu,
Please allow me some time to check on this

Thanks!

Hi @AakankshaS,

Is there any progress?