ONNX runtime prediction using GPU and with different intervals


I’m facing a problem using ONNX runtime to do prediction using GPU (CUDAExecutionProvider) with different intervals. I’m doing the inference using Geforce RTX 2080 GPU. When I do the prediction without intervals (i.e., continuously in the for loop), the average prediction time is around 4ms. But if I insert interval of 0.1 seconds (time.sleep(0.1)), the average prediction latency increases to around 16 ms. Such is also the case for other intervals. But as interval becomes smaller, the prediction latency becomes smaller.
Does anyone face similar problem before or can kindly help to provide some suggestions?


TensorRT Version:
GPU Type: Geforce RTX 2080 Ti
Nvidia Driver Version:
CUDA Version: 11.5
CUDNN Version: 8.2
Operating System + Version: Win 10
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
import keras2onnx
import onnxruntime
import datetime

img_path = ‘./image/defective_sample_0001.png’ # make sure the image is in img_path
img_size = 384
img = image.load_img(img_path, target_size=(img_size, img_size))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

providers = [
(‘CUDAExecutionProvider’, {
‘device_id’: 0,
‘arena_extend_strategy’: ‘kNextPowerOfTwo’,
‘gpu_mem_limit’: 2 * 1024 * 1024 * 1024,
‘cudnn_conv_algo_search’: ‘EXHAUSTIVE’,
‘do_copy_in_default_stream’: True,
temp_model_file = ‘model.onnx’
keras2onnx.save_model(onnx_model, temp_model_file)
sess = onnxruntime.InferenceSession(temp_model_file, providers=providers)

content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)

x = x if isinstance(x, list) else
feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])

for i in range(50):
start_time = datetime.datetime.now()
pred_onnx = sess.run(None, feed)
end_time = datetime.datetime.now()
time_diff = (end_time - start_time)
execution_time = time_diff.total_seconds() * 1000
print("execution time: ", execution_time)
time.sleep(0.1) # this is the place to insert intervals among predictions

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered


This issue doesn’t look like tensorrt related. We recommend you to please post your concern on related platform to get better help.

If it’s related to TensorRT, please provide us more details.

Thank you.

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

Hi, thanks so much for the response.
I have attached my code nad onnx file.
We have also tried the TensorRT
model.onnx (10.3 MB)
test_onnx.py (3.3 KB)
in Win10. And it is same issue.


We recommend you to please post your concern on Issues · microsoft/onnxruntime · GitHub to get better help.

Thank you.