Tensorrt inference time fluctuated when test a big model

Description

I use tensorRT to test some models latency and found that the result of big model (more parameters) fluctuated back and forth, but the small model’s latency is stably. I reckon that it is a normal phenomenon or i just use tensor RT in a wrong way

And a model’s latency will be stably if its latency is less than 5ms (ignoring the time for cpoying data to gpu memory )

Environment

TensorRT Version: 6.0.1.5
GPU Type: NVIDIA T4
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: 7.6.5
Python Version (if applicable): 3.6.4
Persistence-M: ON
**Volatile Uncorr. ECC:**OFF

code

#copy data to gpu in a synchronised way
cuda.memcpy_htod()
s_time = time.time()

#inference in a synchronised way
xxx.execute()
infer_time = time.time() - s_time

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

Hi @928024300,

Fluctuation in latency is expected. It depends on memory and other resources are available. Please make sure every time you have same GPU memory available.

And looks like you’re using very old version of TensorRT. We recommend you to please try on latest TensorRT version. If you still face this issue, we request you to share issue reproducible onnx model and scripts.

Thank you.