Error during conversion of ONNX panoptic model to TensorRT fp16

Description

Hi, I’m currently using a Mask2Former model for panoptic segmentation with an ONNX extension.
I successfully exported it with TRTEXEC to .engine using this command:

trtexec --onnx=end2end.onnx --saveEngine=/deployed_models/panoptic/end2end.engine --minShapes=input:1x3x320x512 --maxShapes=input:1x3x1344x1344 --optShapes=input:1x3x800x1344 --dynamicPlugins=/mmdeploy/build/lib/libmmdeploy.so

but when I try to export it, using floating point 16, it crashes. The command is the same but with --fp16 (see steps to reproduce).

Can you tell what can be the problem?

Environment

TensorRT Version: 8.6.1.6

GPU Type: NVidia RTX 3090 24GB

Nvidia Driver Version: 525.60.11

CUDA Version: 11.7

CUDNN Version: 8.6.0

Operating System + Version: Kubuntu 18.04

Python Version (if applicable): 3.8

PyTorch Version (if applicable): 2.0.1

Baremetal or Container (if container which image + tag):

Miniconda: 3

ONNX: 1.15

ONNXRuntime-GPU: 1.12.0

Steps To Reproduce

trtexec --onnx=end2end.onnx --saveEngine=/deployed_models/panoptic/end2end.engine --minShapes=input:1x3x320x512 --maxShapes=input:1x3x1344x1344 --optShapes=input:1x3x800x1344 --dynamicPlugins=/mmdeploy/build/lib/libmmdeploy.so --fp16

ERROR

[05/29/2024-15:32:18] [I] [TRT] Searching for plugin: grid_sampler, plugin_version: 1, plugin_namespace:
[05/29/2024-15:32:18] [I] [TRT] Successfully created plugin: grid_sampler
[05/29/2024-15:32:18] [I] Finished parsing network model. Parse time: 1.32664
[05/29/2024-15:32:19] [I] [TRT] Graph optimization time: 0.540352 seconds.
[05/29/2024-15:32:19] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2835, GPU 1113 (MiB)
[05/29/2024-15:32:19] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2835, GPU 1123 (MiB)
[05/29/2024-15:32:19] [W] [TRT] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.6.0
[05/29/2024-15:32:19] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
mha_fusion.cpp:355: DCHECK(always(sym_eql(b1, calculateBatchesSym(dims_bmm1_output)))) failed.
[05/29/2024-15:34:29] [E] Error[10]: Could not find any implementation for node {ForeignNode[(Unnamed Layer 10965) [Constant] + (Unnamed Layer 10966) [Shuffle]…/backbone/Reshape_3 + /backbone/Transpose_3]}.**
[05/29/2024-15:34:29] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer 10965) [Constant] + (Unnamed Layer 10966) [Shuffle]…/backbone/Reshape_3 + /backbone/Transpose_3]}.)**
[05/29/2024-15:34:29] [E] Engine could not be created from network
[05/29/2024-15:34:29] [E] Building engine failed
[05/29/2024-15:34:29] [E] Failed to create engine from model or file.
[05/29/2024-15:34:29] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # …

Hi @lidiafanta ,
Can you please help us with your model.

Hi, here is the link to the model that I tried to export to TRT.

This model is a panoptic segmentation model called Mask2Former. I take it from the OpenMMLab ghitub project, but since their forum is no longer active and since the error seems to be related to TRT, I would kindly ask you to take a look to the possible problem.

Thank you so much again

Any update on this?