Description
The accuracy of converted TensorRT engine was significantly lower (like 50%) than the onnx model, when built with --fp16 flag, while fp32 buit was okay.
I have tried to use polygraph run to figure it out bad layers but had error “Could not find any implementation for node {ForeignNode[/model/decoder/Where_output_0…/model/decoder/decoder/dec_bbox_head.0/layers.2/Add]}.”.
Please kindly help me out the fp16 issue. Thanks!
Environment
TensorRT Version: 8.6.1.6
GPU Type: RTX 3050
Nvidia Driver Version: 550.90.07
CUDA Version: 12.4
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
./trtexec --onnx= rtdetr_r18vd_custom.onnx --fp16 --minShapes=images:1x3x512x512 --optShapes=images:5x3x512x512 --maxShapes=images:16x3x512x512 --saveEngine=rtdetr_r18vd_custom_fp16.engine --verbose
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered