Trtexec ConvTranspose cost to much time when inference

I have a simple network with ConvTranspose to upsample feature map.

I use trtexec to convert onnx model to tensorrt engine file. I set --verbose --dumpProfile to get more information about model optimizer.
but I found the profile results is abnormal, the ConvTranspose layer cost to much time, almost about 50% of inference time.

the following is the onnx model and profile logs.

horizon_second.onnx (18.3 MB)
horizon_times.json (31.3 KB)
horizon_profile.json (4.4 KB)

Environment

TensorRT Version: 8.0.3
GPU Type: A6000
Nvidia Driver Version: 470.141.03
CUDA Version: 11.4
CUDNN Version: 8.2.4
Operating System + Version: ubuntu 20.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.10
Baremetal or Container (if container which image + tag):

Hi,

We recommend you to please try on the latest TensorRT version 8.4 GA Update 1 and if you still face this issue share with us verbose logs and command to try from our end for better debugging.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Thank you.