Large time delay by chance in TensorRT8.2

Description

Dear my frined,

   Now  I am using TesnorRT to run  face  detection & extraction models to achieve 1:1 compare service.

   I  do the  pressure test with  million picture pairs,  to  check the stability of the 1:1 face service.

   The  test take  more than a hour,   and  I found  a problem:   in most time,  model inference achieves in  less than 100ms.
   But  by chance,   it might  took  more then   5 seconds  for the TensorRT  to achiece  1:1 face compare .   It  happends  around  100 times   in  million 

   Pls kindly  help ,   it  is  a  normal  performance  for TensorRT ?   What achieve this  big time delay

   Ths a  lot.

Environment

TensorRT Version: 8.2
GPU Type: V100
CUDA Version: CUDA 11

Hi, Please refer to the below links to perform inference in INT8

Thanks!

Dear my friend, I check your documents, there is the average performance of TensorRT, the result is good.

During my test, the average time cost for TensorRT inference is as good as your document.

But my question is the suddenly large time delay, It happens randomly, I don’t konw how to improve it.

Hi,

We recommend you to please try the latest TensoRT version 8.4 and if you still face this issue, please share with us the minimal issue repro ONNX model and scripts to try from our end for better debugging.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Thank you.

1 Like

Dear my friends,

    I reboot my GPU  server computer, and this issue is ok now。  Thanks a lot!
1 Like