Dear my frined，
Now I am using TesnorRT to run face detection & extraction models to achieve 1:1 compare service.
I do the pressure test with million picture pairs， to check the stability of the 1：1 face service.
The test take more than a hour, and I found a problem: in most time, model inference achieves in less than 100ms.
But by chance, it might took more then 5 seconds for the TensorRT to achiece 1:1 face compare . It happends around 100 times in million
Pls kindly help , it is a normal performance for TensorRT ? What achieve this big time delay
Ths a lot.
TensorRT Version: 8.2
GPU Type: V100
CUDA Version: CUDA 11
Hi, Please refer to the below links to perform inference in INT8
Dear my friend, I check your documents, there is the average performance of TensorRT, the result is good.
During my test, the average time cost for TensorRT inference is as good as your document.
But my question is the suddenly large time delay, It happens randomly, I don’t konw how to improve it.
We recommend you to please try the latest TensoRT version 8.4 and if you still face this issue, please share with us the minimal issue repro ONNX model and scripts to try from our end for better debugging.
Dear my friends，
I reboot my GPU server computer， and this issue is ok now。 Thanks a lot！