Description
TensorRT runs 3x slower than pytorch with large data input (~(4x3x500x500)) on deeplabv3 - resnet50. On smaller data input (4x3x50x0) it runs twice faster
Environment
TensorRT Version: 8.0.1-1+cuda10.2 arm
GPU Type: Xavier AGX (maxn mode)
Nvidia Driver Version: Unknown
CUDA Version: 10.2
CUDNN Version: 8.2.1
Operating System + Version: Jetpack 4.6 rev1
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): Not installed
PyTorch Version (if applicable): 1.8.0
Baremetal or Container (if container which image + tag):
Relevant Files
Steps To Reproduce
Please include:
- Install torch2trt
- run python3 train_torch.py