After TensorRT quantize DETR model mAP nealy zero

Description

I tried to quantization DETR to int8 in Tensorrt 8.6.1, when i eval int8 model in coco val 2017 dataset, mAP is nearly zero, as follows:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.005
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.014
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.003
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.010
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.021
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.039
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.046
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.011
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.104

But my fp32 onnx model result is normal as official result. I have tried many ways to optimize quantization, including add fp32 node in quantization, upgrade tensorrt version to 10.5.0. But the int8 result didn’t change.

Dose int8 model has lower mAP is because of transformer model diffcult to quantization ? But the result should’t be nearly zero. The fp32 model and int8 model has the same preprcoess and postprocess code, so i don’t think the problem is preprocess and postprocess.

quantization log as follows:

found all 999 images to calib.
[10/12/2024-16:01:04] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Building an engine.  This would take a while...
(Use "--verbose" or "-v" to enable verbose logging.)
./tensorrt_ptq.py:122: DeprecationWarning: Use set_memory_pool_limit instead.
  config.max_workspace_size = 2 << 30
trt.DataType.INT8
Int8 calibration is enabled.
./tensorrt_ptq.py:156: DeprecationWarning: Use build_serialized_network instead.
  engine = builder.build_engine(network, config)
[10/12/2024-16:02:45] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 361) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[10/12/2024-16:02:45] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 381) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[10/12/2024-16:02:45] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 385) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[10/12/2024-16:02:45] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 1748) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[10/12/2024-16:06:08] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[10/12/2024-16:06:08] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[10/12/2024-16:06:08] [TRT] [W] Check verbose logs for the list of affected weights.
[10/12/2024-16:06:08] [TRT] [W] - 228 weights are affected by this issue: Detected subnormal FP16 values.
[10/12/2024-16:06:08] [TRT] [W] - 58 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
Serialized the TensorRT engine to file: /home/heyuanhong/quantization/detr_test/detr/onnx/new_detr_sim.onnx.int8

Environment

TensorRT Version: 8.6.1.post1
GPU Type: 4060ti 16GB
Nvidia Driver Version: 550.54
CUDA Version: cuda 12.4
CUDNN Version: cu12
Operating System + Version: ubuntu 20.04
Python Version (if applicable): python 3.8.19
TensorFlow Version (if applicable): /
PyTorch Version (if applicable): 2.0.0
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:24.03-py3

Relevant Files

Here my onnx model: https://drive.google.com/file/d/1citGq4HegghVSniAC6nMpZtMFKrUScaC/view?usp=sharing
Here my code: https://drive.google.com/file/d/18GgPUsIC795TB7x7jVACUQMX8jxuvlMH/view?usp=drive_link
Here val dataset : https://drive.google.com/file/d/1isfnBwRhhMfui7mUK1_3FD39SIOOisSe/view?usp=drive_link
Here calibration dataset : https://drive.google.com/file/d/1AsXX5kdLBhj2NYoJO7WQaZGZ61oDbJnq/view?usp=drive_link
Here val dataset annotations: https://drive.google.com/file/d/153T25c8VD64TFTdZIAPPM4PZKch6H4HC/view?usp=drive_link

Steps To Reproduce

  1. ptq command
python3 ./tensorrt_ptq.py \
    --input_model_path /home/quantization/detr_test/detr/onnx/new_detr_sim.onnx \
    --dtype int8 \
    --batch_size 16 \
    --calibrate_dataset /home/quantization/dataset/dataset/quantization_calib_dataset/coco_calib/images/ \
    --img_size 800
  1. eval command
# eval int engine
 python tensorrt_eval.py \
        --is_trt \
         --input_model_path /home/heyuanhong/quantization/detr_test/detr/onnx/new_detr_sim.onnx.int8 \
         --precision int8 \
        --eval_dataset /home/heyuanhong/quantization/dataset/dataset/val_dataset/val2017/ \
         --annotations /home/heyuanhong/quantization/dataset/annotations/instances_val2017.json
# eval onnx fp32 model
 python tensorrt_eval.py \
         --input_model_path /home/quantization/detr_test/detr/onnx/new_detr_sim.onnx \
         --precision fp32 \
        --eval_dataset /home/quantization/dataset/dataset/val_dataset/val2017/ \
         --annotations /home/quantization/dataset/annotations/instances_val2017.json