Inference time is more for TensorRT engine than Pytorch model for Retinaface

shashikant.ghangare · June 4, 2021, 9:43am

Description

I converted the Retinface model (in pytorch) from here to Tensorrt engine by following the steps from tensorrtx repo. The inference time on Pytorch is 11 ms but the inference time fo rthe TensorRT engine (FP16 weights) is 16 ms.

Not able to understand why the inference time is increasing after converting to FP16 TensorRT Engine.

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version: 460
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

bgiddwani · June 4, 2021, 10:30am

Hi @shashikant.ghangare ,

I have implemented the approach in two ways here: https://github.com/bharat3012/Retinaface_Arcface_TRT

A. 00_Retinaface_Arcface_TRT_Triton.ipynb
> Same Retinafce used but different Arcface
B. 01_Insightface_TRT_Triton.ipynb
> MX Net models used.

Do let me know if this you find useful.

Topic		Replies	Views
The first inference using tensorRT model takes far longer time than that using tensorflow model TensorRT	0	658	November 13, 2020
TensorRT inference time extremely slow TensorRT	1	451	January 31, 2023
TensorRT inference Time TensorRT	1	759	September 20, 2018
FP16 doesn't bring improvement to inference TensorRT	0	911	May 29, 2019
The inference time of Deconvolution in tensorrt is slower than pytorch Triton Inference Server - archived tensorrt	0	789	April 15, 2020
inference time of tensorrt is slower than tensorflow !!! TensorRT	2	1435	September 27, 2019
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3538	October 23, 2020
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1930	November 11, 2021
TorchTensorRT lowering performance in real time inference TensorRT	1	367	July 6, 2023
P6000 TensorRT too slow and the serialized fp16-model size is not as expected TensorRT tensorrt	1	458	April 4, 2023

Inference time is more for TensorRT engine than Pytorch model for Retinaface

Description

Environment

Relevant Files

Related topics