Description
I’m working for TensorRT INT8 inference.
Refer to " GTC 2020: Integer Quantization for DNN Inference Acceleration | NVIDIA Developer" , the PTQ performance is good. But when I try this calibration, the result is too worse. And the IInt8EntropyCalibrator
is also worse.
Some information about my test.
- Data set is COCO.
- Network type is
PoseEstimation
Some questions about PTQ.
Is there some way to adjust the
max
ormin
,like, 99.9%, 99.999%, ofIInt8MinMaxCalibrator
?
Because of the low precision of PTQ, I also test the PTQ. But there are some problems when I convert the Tensorflow1.0 model to TensorRT following the guide( GitHub - NVIDIA/sampleQAT: Inference of quantization aware trained networks using TensorRT).
Some questions about QAT
I generated two ONNX QAT model,
and .The left is a part of a large model, and the right is generate from scratch. It look there is no different except for the names. The right can be converted to TensorRT successfully, but the left is return an error, “[TensorRT] ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/safeWeightsPtr.h (102) - Assertion Error in setCount: 0 (count >= 0)”.
Is there more details about those errors and QAT of TensorRT?
There are some other problems likecheck sanity error
,segmentation fault directly
Environment
TensorRT Version: 7.2.1.6 (Python)
GPU Type: 2070super
Nvidia Driver Version: 455.45.01
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 1.15.4
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorflow:20.11-tf1-py3
Relevant Files
https://github.com/lc-zhu/SharedFiles.git
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Crete Left model
# 1. Create Left model:
python3 yolov3.py
# 2. Convert to ONNX
python3 -m tf2onnx.convert --input ./frozen_qat_yolov3/frozen_qat_bn1.pb --output ./frozen_qat_yolov3/frozen_qat_bn1.onnx --inputs "input:0" --outputs "darknet53_body/Relu:0" --opset 10 --fold_const
# 3. Post-processing for removing `transpose`
python3 postprocess_onnx.py --input ./frozen_qat_yolov3/frozen_qat_bn1.onnx --output ./frozen_qat_yolov3/frozen_qat_bn1_post.onnx
#4. Build engine
python3 build_engine.py --onnx ./frozen_qat_yolov3/frozen_qat_bn1_post.onnx --engine ./frozen_qat_yolov3/frozen_qat_bn1.engine -v
- Create right model
# 1. Create Left model:
python3 model.py
# 2. Convert to ONNX
python3 -m tf2onnx.convert --input ./frozen_qat/frozen_qat_bn1.pb --output ./frozen_qat/frozen_qat_bn1.onnx --inputs "input:0" --outputs "output:0" --opset 10 --fold_const
# 3. Post-processing for removing `transpose`
python3 postprocess_onnx.py --input ./frozen_qat/frozen_qat_bn1.onnx --output ./frozen_qat/frozen_qat_bn1_post.onnx
# 4. Build engine
python3 build_engine.py --onnx ./frozen_qat/frozen_qat_bn1_post.onnx --engine ./frozen_qat/frozen_qat_bn1.engine -v
- Full traceback of errors encountered
[TensorRT] ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/safeWeightsPtr.h (102) - Assertion Error in setCount: 0 (count >= 0)