Description
I’m facing a problem using ONNX runtime to do prediction using GPU (CUDAExecutionProvider) with different intervals. I’m doing the inference using Geforce RTX 2080 GPU. When I do the prediction without intervals (i.e., continuously in the for loop), the average prediction time is around 4ms. But if I insert interval of 0.1 seconds (time.sleep(0.1)), the average prediction latency increases to around 16 ms. Such is also the case for other intervals. But as interval becomes smaller, the prediction latency becomes smaller.
Does anyone face similar problem before or can kindly help to provide some suggestions?
Environment
TensorRT Version:
GPU Type: Geforce RTX 2080 Ti
Nvidia Driver Version:
CUDA Version: 11.5
CUDNN Version: 8.2
Operating System + Version: Win 10
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
import keras2onnx
import onnxruntime
import datetime
img_path = ‘./image/defective_sample_0001.png’ # make sure the image is in img_path
img_size = 384
img = image.load_img(img_path, target_size=(img_size, img_size))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
providers = [
(‘CUDAExecutionProvider’, {
‘device_id’: 0,
‘arena_extend_strategy’: ‘kNextPowerOfTwo’,
‘gpu_mem_limit’: 2 * 1024 * 1024 * 1024,
‘cudnn_conv_algo_search’: ‘EXHAUSTIVE’,
‘do_copy_in_default_stream’: True,
}),
‘CPUExecutionProvider’,
]
temp_model_file = ‘model.onnx’
keras2onnx.save_model(onnx_model, temp_model_file)
sess = onnxruntime.InferenceSession(temp_model_file, providers=providers)
content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)
x = x if isinstance(x, list) else
feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])
for i in range(50):
start_time = datetime.datetime.now()
pred_onnx = sess.run(None, feed)
end_time = datetime.datetime.now()
time_diff = (end_time - start_time)
execution_time = time_diff.total_seconds() * 1000
print("execution time: ", execution_time)
time.sleep(0.1) # this is the place to insert intervals among predictions
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered