Description
I tried to quantization DETR to int8 in Tensorrt 8.6.1, when i eval int8 model in coco val 2017 dataset, mAP is nearly zero, as follows:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.005
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.014
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.003
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.010
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.021
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.039
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.046
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.104
But my fp32 onnx model result is normal as official result. I have tried many ways to optimize quantization, including add fp32 node in quantization, upgrade tensorrt version to 10.5.0. But the int8 result didn’t change.
Dose int8 model has lower mAP is because of transformer model diffcult to quantization ? But the result should’t be nearly zero. The fp32 model and int8 model has the same preprcoess and postprocess code, so i don’t think the problem is preprocess and postprocess.
quantization log as follows:
found all 999 images to calib.
[10/12/2024-16:01:04] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Building an engine. This would take a while...
(Use "--verbose" or "-v" to enable verbose logging.)
./tensorrt_ptq.py:122: DeprecationWarning: Use set_memory_pool_limit instead.
config.max_workspace_size = 2 << 30
trt.DataType.INT8
Int8 calibration is enabled.
./tensorrt_ptq.py:156: DeprecationWarning: Use build_serialized_network instead.
engine = builder.build_engine(network, config)
[10/12/2024-16:02:45] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 361) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[10/12/2024-16:02:45] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 381) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[10/12/2024-16:02:45] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 385) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[10/12/2024-16:02:45] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 1748) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[10/12/2024-16:06:08] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[10/12/2024-16:06:08] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[10/12/2024-16:06:08] [TRT] [W] Check verbose logs for the list of affected weights.
[10/12/2024-16:06:08] [TRT] [W] - 228 weights are affected by this issue: Detected subnormal FP16 values.
[10/12/2024-16:06:08] [TRT] [W] - 58 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
Serialized the TensorRT engine to file: /home/heyuanhong/quantization/detr_test/detr/onnx/new_detr_sim.onnx.int8
Environment
TensorRT Version: 8.6.1.post1
GPU Type: 4060ti 16GB
Nvidia Driver Version: 550.54
CUDA Version: cuda 12.4
CUDNN Version: cu12
Operating System + Version: ubuntu 20.04
Python Version (if applicable): python 3.8.19
TensorFlow Version (if applicable): /
PyTorch Version (if applicable): 2.0.0
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:24.03-py3
Relevant Files
Here my onnx model: https://drive.google.com/file/d/1citGq4HegghVSniAC6nMpZtMFKrUScaC/view?usp=sharing
Here my code: https://drive.google.com/file/d/18GgPUsIC795TB7x7jVACUQMX8jxuvlMH/view?usp=drive_link
Here val dataset : https://drive.google.com/file/d/1isfnBwRhhMfui7mUK1_3FD39SIOOisSe/view?usp=drive_link
Here calibration dataset : https://drive.google.com/file/d/1AsXX5kdLBhj2NYoJO7WQaZGZ61oDbJnq/view?usp=drive_link
Here val dataset annotations: https://drive.google.com/file/d/153T25c8VD64TFTdZIAPPM4PZKch6H4HC/view?usp=drive_link
Steps To Reproduce
- ptq command
python3 ./tensorrt_ptq.py \
--input_model_path /home/quantization/detr_test/detr/onnx/new_detr_sim.onnx \
--dtype int8 \
--batch_size 16 \
--calibrate_dataset /home/quantization/dataset/dataset/quantization_calib_dataset/coco_calib/images/ \
--img_size 800
- eval command
# eval int engine
python tensorrt_eval.py \
--is_trt \
--input_model_path /home/heyuanhong/quantization/detr_test/detr/onnx/new_detr_sim.onnx.int8 \
--precision int8 \
--eval_dataset /home/heyuanhong/quantization/dataset/dataset/val_dataset/val2017/ \
--annotations /home/heyuanhong/quantization/dataset/annotations/instances_val2017.json
# eval onnx fp32 model
python tensorrt_eval.py \
--input_model_path /home/quantization/detr_test/detr/onnx/new_detr_sim.onnx \
--precision fp32 \
--eval_dataset /home/quantization/dataset/dataset/val_dataset/val2017/ \
--annotations /home/quantization/dataset/annotations/instances_val2017.json