I converted the Retinface model (in pytorch) from here to Tensorrt engine by following the steps from tensorrtx repo. The inference time on Pytorch is 11 ms but the inference time fo rthe TensorRT engine (FP16 weights) is 16 ms.
Not able to understand why the inference time is increasing after converting to FP16 TensorRT Engine.
Nvidia Driver Version: 460
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
Baremetal or Container (if container which image + tag): Baremetal
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)