TensorRT 8.6 conversion of RTDETR custom model accuracy dropped at fp16

Description

The accuracy of converted TensorRT engine was significantly lower (like 50%) than the onnx model, when built with --fp16 flag, while fp32 buit was okay.

I have tried to use polygraph run to figure it out bad layers but had error “Could not find any implementation for node {ForeignNode[/model/decoder/Where_output_0…/model/decoder/decoder/dec_bbox_head.0/layers.2/Add]}.”.

Please kindly help me out the fp16 issue. Thanks!

Environment

TensorRT Version: 8.6.1.6
GPU Type: RTX 3050
Nvidia Driver Version: 550.90.07
CUDA Version: 12.4
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

./trtexec --onnx= rtdetr_r18vd_custom.onnx --fp16 --minShapes=images:1x3x512x512 --optShapes=images:5x3x512x512 --maxShapes=images:16x3x512x512 --saveEngine=rtdetr_r18vd_custom_fp16.engine --verbose

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered