Description
Dear my frined,
Now I am using TesnorRT to run face detection & extraction models to achieve 1:1 compare service.
I do the pressure test with million picture pairs, to check the stability of the 1:1 face service.
The test take more than a hour, and I found a problem: in most time, model inference achieves in less than 100ms.
But by chance, it might took more then 5 seconds for the TensorRT to achiece 1:1 face compare . It happends around 100 times in million
Pls kindly help , it is a normal performance for TensorRT ? What achieve this big time delay
Ths a lot.
Environment
TensorRT Version: 8.2
GPU Type: V100
CUDA Version: CUDA 11