Is the inference cost time affected by the frequency of calls?

Description

Calls tensorrt inference at 10 fps, the cost time per call is 30ms.
But calls at 50 fps, it is down to 13ms, reduced so much.

How is the inference cost time affected by the frequency of calls?

Environment

TensorRT Version: TensorRT-7.1.3.4
GPU Type: Tesla 4
Nvidia Driver Version: 418.87.00
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: Ubuntu18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4
Baremetal or Container (if container which image + tag):

Hi @zehua.12,
What’s the batch size you used during model generation? Is there any optimized profile set for your model?
Could you please share the model and script files so we can help better?

Thanks

Sorry, I can’t share private model and script.

The tensorrt engine generated from onnx which input batch size is 1, and used default optimized profile. Always do image interference one by one, not batch, so i guess optimized profile has nothing with this issue.
Maybe gpu cache cause the issue?