Inference time is more for TensorRT engine than Pytorch model for Retinaface

Description

I converted the Retinface model (in pytorch) from here to Tensorrt engine by following the steps from tensorrtx repo. The inference time on Pytorch is 11 ms but the inference time fo rthe TensorRT engine (FP16 weights) is 16 ms.

Not able to understand why the inference time is increasing after converting to FP16 TensorRT Engine.

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version: 460
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Hi @shashikant.ghangare ,

I have implemented the approach in two ways here: GitHub - bharat3012/Retinaface_Arcface_TRT

A. 00_Retinaface_Arcface_TRT_Triton.ipynb
> Same Retinafce used but different Arcface
B. 01_Insightface_TRT_Triton.ipynb
> MX Net models used.

Do let me know if this you find useful.