Description
Hi, I’m currently using a Mask2Former model for panoptic segmentation with an ONNX extension.
I successfully exported it with TRTEXEC to .engine using this command:
trtexec --onnx=end2end.onnx --saveEngine=/deployed_models/panoptic/end2end.engine --minShapes=input:1x3x320x512 --maxShapes=input:1x3x1344x1344 --optShapes=input:1x3x800x1344 --dynamicPlugins=/mmdeploy/build/lib/libmmdeploy.so
but when I try to export it, using floating point 16, it crashes. The command is the same but with --fp16 (see steps to reproduce).
Can you tell what can be the problem?
Environment
TensorRT Version: 8.6.1.6
GPU Type: NVidia RTX 3090 24GB
Nvidia Driver Version: 525.60.11
CUDA Version: 11.7
CUDNN Version: 8.6.0
Operating System + Version: Kubuntu 18.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): 2.0.1
Baremetal or Container (if container which image + tag):
Miniconda: 3
ONNX: 1.15
ONNXRuntime-GPU: 1.12.0
Steps To Reproduce
trtexec --onnx=end2end.onnx --saveEngine=/deployed_models/panoptic/end2end.engine --minShapes=input:1x3x320x512 --maxShapes=input:1x3x1344x1344 --optShapes=input:1x3x800x1344 --dynamicPlugins=/mmdeploy/build/lib/libmmdeploy.so --fp16
ERROR
…
[05/29/2024-15:32:18] [I] [TRT] Searching for plugin: grid_sampler, plugin_version: 1, plugin_namespace:
[05/29/2024-15:32:18] [I] [TRT] Successfully created plugin: grid_sampler
[05/29/2024-15:32:18] [I] Finished parsing network model. Parse time: 1.32664
[05/29/2024-15:32:19] [I] [TRT] Graph optimization time: 0.540352 seconds.
[05/29/2024-15:32:19] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2835, GPU 1113 (MiB)
[05/29/2024-15:32:19] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2835, GPU 1123 (MiB)
[05/29/2024-15:32:19] [W] [TRT] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.6.0
[05/29/2024-15:32:19] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
mha_fusion.cpp:355: DCHECK(always(sym_eql(b1, calculateBatchesSym(dims_bmm1_output)))) failed.
[05/29/2024-15:34:29] [E] Error[10]: Could not find any implementation for node {ForeignNode[(Unnamed Layer 10965) [Constant] + (Unnamed Layer 10966) [Shuffle]…/backbone/Reshape_3 + /backbone/Transpose_3]}.**
[05/29/2024-15:34:29] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer 10965) [Constant] + (Unnamed Layer 10966) [Shuffle]…/backbone/Reshape_3 + /backbone/Transpose_3]}.)**
[05/29/2024-15:34:29] [E] Engine could not be created from network
[05/29/2024-15:34:29] [E] Building engine failed
[05/29/2024-15:34:29] [E] Failed to create engine from model or file.
[05/29/2024-15:34:29] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # …