TensorRT runs slower than pytorch on Xavier AGX on deeplabv3 when input is large

Description

TensorRT runs 3x slower than pytorch with large data input (~(4x3x500x500)) on deeplabv3 - resnet50. On smaller data input (4x3x50x0) it runs twice faster

Environment

TensorRT Version: 8.0.1-1+cuda10.2 arm
GPU Type: Xavier AGX (maxn mode)
Nvidia Driver Version: Unknown
CUDA Version: 10.2
CUDNN Version: 8.2.1
Operating System + Version: Jetpack 4.6 rev1
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): Not installed
PyTorch Version (if applicable): 1.8.0
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

Please include:

  • Install torch2trt
  • run python3 train_torch.py

Hi,

Could you share the performance you observed with us?

Since you have converted the model from Torch into TensorRT.
Would you mind benchmarking the result with trtexec as well?

1. Serialize the TensorRT engine
2. Test with trtexec:

$ /usr/src/tensorrt/bin/trtexec --loadEngine=[trt engine]

Thanks.