Hi,
Inference of my YoloV4 model is slow (~1s)
What I’m doing is testing inference with random numpy array and images, for example:
warmups = 100
# test on numpy array
for i in range(warmups):
crop = np.random.randint(256, size=(320, 320, 3))
crop = crop / 255.
crop = np.asarray([crop]).astype(np.float32)
batch_data = tf.constant(crop)
inference_start_time = time.time()
rez = model_signature(batch_data)
inference_end_time = time.time()
print('Yolo inference ' + str(i) + 'th time: ' + str(inference_end_time - inference_start_time))
# test on images
for path in image_paths:
img_ori = cv2.imread(image_dir + os.path.sep + path)
num_frames_processed += 1
image_data = cv2.resize(img_ori, (320, 320))
image_data = image_data / 255.
images_data = np.asarray([image_data]).astype(np.float32)
batch_data = tf.constant(images_data)
yolo_inference_start_time = time.time()
model_signature(batch_data)
yolo_inference_end_time = time.time()
print('Yolo inference time: ', yolo_inference_end_time - yolo_inference_start_time)
When I test my model on random numpy array, inference speed is ~8ms
Yolo inference 0th time: 331.1708495616913
Yolo inference 1th time: 0.009318113327026367
Yolo inference 2th time: 0.00907754898071289
Yolo inference 3th time: 0.008883953094482422
Yolo inference 4th time: 0.008752107620239258
Yolo inference 5th time: 0.008726835250854492
Yolo inference 6th time: 0.00879526138305664
Yolo inference 7th time: 0.008597135543823242
Yolo inference 8th time: 0.00872182846069336
Yolo inference 9th time: 0.008661270141601562
But when I try to loading images and doing inference, speed is 1s.
Yolo inference time: 1.6193459033966064
Yolo inference time: 1.6982169151306152
Yolo inference time: 1.6127874851226807
Yolo inference time: 1.6061813831329346
Yolo inference time: 1.6975934505462646
Yolo inference time: 1.5988457202911377
Yolo inference time: 1.5998070240020752
Yolo inference time: 1.6140410900115967
Yolo inference time: 1.6149139404296875
Yolo inference time: 1.6045515537261963
What could be the reason for this behavior?
Hi,
Could you check the size and dtype of crop and image_data first?
More, does your TensorFlow apply inference with GPU or CPU?
Thanks.
Hello,
size of crop and is 1228960, dtype is float32.
Size of images_data is 1228960, dtype is float32.
(When I read image for the first time, size is 6220944, dtype is float64).
I haven’t called set_memory_growth, but observing output with jtop I see GPU is being utilized.
Hello,
to add to the conversation, as an experiment I added simple for loop within existing for loop for reading images:
for i in range(10):
yolo_inference_start_time = time.time()
model_signature(batch_data)
yolo_inference_end_time = time.time()
print('Yolo inference ' + str(i) + 'th time: ' + str(yolo_inference_end_time - yolo_inference_start_time))
This is the output:
Yolo inference 0th time: 1.657867670059204
Yolo inference 1th time: 0.009186506271362305
Yolo inference 2th time: 0.008753538131713867
Yolo inference 3th time: 0.008697271347045898
Yolo inference 4th time: 0.008661746978759766
Yolo inference 5th time: 0.008554935455322266
Yolo inference 6th time: 0.008575677871704102
Yolo inference 7th time: 0.008648395538330078
Yolo inference 8th time: 0.008580207824707031
Yolo inference 9th time: 0.008511066436767578
Yolo inference 0th time: 1.6211886405944824
Yolo inference 1th time: 0.00919961929321289
Yolo inference 2th time: 0.008706808090209961
Yolo inference 3th time: 0.008604764938354492
Yolo inference 4th time: 0.00857996940612793
Yolo inference 5th time: 0.008562088012695312
Yolo inference 6th time: 0.008592367172241211
Yolo inference 7th time: 0.008548498153686523
Yolo inference 8th time: 0.008559465408325195
Yolo inference 9th time: 0.008565664291381836
Yolo inference 0th time: 1.9278879165649414
Yolo inference 1th time: 0.009716272354125977
Yolo inference 2th time: 0.008942127227783203
It seems first time inference is slow, but it is faster when doing inference on same image again.
Also, since I will have to work with RTSP stream at some point, I tried loading frame by frame and doing inference on them.
while(cap.isOpened()):
ret, frame = cap.read()
image_data = cv2.resize(frame, (320, 320))
image_data = image_data / 255.
images_data = np.asarray([image_data]).astype(np.float32)
batch_data = tf.constant(images_data)
yolo_inference_start_time = time.time()
model_signature(batch_data)
yolo_inference_end_time = time.time()
print('Yolo inference time: ', yolo_inference_end_time - yolo_inference_start_time)
total_times.append(yolo_inference_end_time - yolo_inference_start_time)
Here, inference results vary from 2s to 8ms:
Yolo inference time: 7.396626949310303
Yolo inference time: 1.9400830268859863
Yolo inference time: 0.009636878967285156
Yolo inference time: 0.008914470672607422
Yolo inference time: 0.008938074111938477
Yolo inference time: 0.009112834930419922
Yolo inference time: 0.008854150772094727
Yolo inference time: 1.7614493370056152
Yolo inference time: 0.00947427749633789
Yolo inference time: 1.6145720481872559
Yolo inference time: 0.009255170822143555
Yolo inference time: 0.008889436721801758
Yolo inference time: 0.008978843688964844
Yolo inference time: 0.009033918380737305
Yolo inference time: 0.008960723876953125
Yolo inference time: 1.6211168766021729
Yolo inference time: 1.7184834480285645
Yolo inference time: 0.009260892868041992
Yolo inference time: 0.009219646453857422
Yolo inference time: 0.009096860885620117
Yolo inference time: 0.009061098098754883
Yolo inference time: 0.008889198303222656
Yolo inference time: 0.008873224258422852
Yolo inference time: 0.008953332901000977
Hi,
Would you mind profiling the image version inference with the Nsight System?
It looks like there is some underlying preprocessing when TensorFlow meets the image data for the first time.
Or is it possible to use the same data buffer for OpenCV to read the image?
Thanks.
nikola2
September 4, 2023, 10:39am
7
Hi,
I generated a report for my script, but I’m having a hard time identifying the problem, could you please check it out or give me some guidelines?
The report is on this link .
Or is it possible to use the same data buffer for OpenCV to read the image?
I’m not sure I understood your proposal, can you better explain it, or share some example?
Hi,
For example, you preallocate a buffer.
And every time read the image data to the same buffer?
Thanks.
nikola2
September 5, 2023, 10:02am
9
Hi again,
so I tried creating buffer and then send preprocessed images to the same object, like this:
buff = np.random.randint(256, size=(320, 320, 3)).astype(np.float32)
buff = tf.constant([buff])
for path in image_paths:
img_ori = cv2.imread(image_dir + os.path.sep + path)
image_data = cv2.resize(img_ori, (320, 320))
image_data = image_data / 255.
images_data = np.asarray([image_data]).astype(np.float32)
buff = tf.constant(images_data)
model_signature(buff)
But there’s no change in the inference speed.
nikola2
September 5, 2023, 12:54pm
10
Hi,
I tried setting perf_event_paranoid to collect CPU samples, but I haven’t noticed anything out of the ordinary. I’ve also uploaded the new report, could you be so kind to check if I’m missing somthing?
newest_nsight_report.nsys-rep (74.3 MB)
Also, I have this warning:
Analysis
323575
00:14.920
CUDA profiling might have not been started correctly.
Analysis
323575
00:14.920
No CUDA events collected. Does the process use CUDA?
and
Analysis
323463
00:14.920
No cuDNN events collected. Does the process use cuDNN?
Analysis
323486
00:14.920
cuDNN profiling might have not been started correctly.
Is this expected behavior for this version?
Thanks.
nikola2
September 8, 2023, 8:15am
11
Hi,
hope you are doing well.
What further steps can I take to resolve this urgent issue?
And did you manage to take a look at my nsight report?
Thanks
1 Like
Hi,
Sorry for the late update.
Do you have a reproducible source can share with us?
We need to check it further for the suggestion.
Thanks.
1 Like
nikola2
September 13, 2023, 9:07am
13
Hi,
here’s the source code you can try. I’ve included couple of Yolo models on this link.
import os
import time
from datetime import datetime
import cv2
import numpy as np
import tensorflow as tf
YOLO_TRT_MODEL_DIR = "weights/tftrt_saved_model_2"
# "weights/tftrt_saved_model_fp16"
# "weights/yolov4-trt-int8-320"
# "weights/yolov4_trt_convert"
WARMUPS = 50
input_size = (320, 320, 3)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
model = tf.saved_model.load(YOLO_TRT_MODEL_DIR)
model_signature = model.signatures['serving_default']
crop = np.random.randint(256, size=input_size)
crop = crop / 255.
crop = np.asarray([crop]).astype(np.float32)
batch_data = tf.constant(crop)
for i in range(WARMUPS):
inference_start_time = datetime.today()
rez = model_signature(batch_data)
inference_end_time = datetime.today()
print('Yolo inference ' + str(i) + 'th time: ' + str(inference_end_time - inference_start_time))
image_dir = "./images/"
image_paths = sorted(os.listdir(image_dir))
for_start_time = time.time()
num_frames_processed = 0
for path in image_paths:
img_ori = cv2.imread(image_dir + os.path.sep + path)
image_data = cv2.resize(img_ori, (320, 320))
image_data = image_data / 255.
images_data = np.asarray([image_data]).astype(np.float32)
batch_data = tf.constant(images_data)
num_frames_processed += 1
yolo_inference_start_time = time.time()
model_signature(batch_data)
yolo_inference_end_time = time.time()
print('Yolo inference time: ', yolo_inference_end_time - yolo_inference_start_time)
for_end_time = time.time()
elapsed = for_end_time - for_start_time
print('Total inference time: ', elapsed)
fps = num_frames_processed/elapsed
print("FPS: {} , Elapsed Time: {} , Frames Processed: {}".format(fps, elapsed, num_frames_processed))
Hi,
Thanks for sharing the source.
We are checking this issue in deep.
Will get back to you later.
Hi,
Thanks for your patience.
We tried to reproduce this issue on Orin with JetPack 5.1.2 and TensorFlow 2.12.0+nv23.06.
But the inference looks stable. Do we miss anything?
Yolo inference 0th time: 0:05:39.952503
Yolo inference 1th time: 0:00:00.021688
Yolo inference 2th time: 0:00:00.021798
Yolo inference 3th time: 0:00:00.020233
Yolo inference 4th time: 0:00:00.019178
Yolo inference 5th time: 0:00:00.028624
Yolo inference 6th time: 0:00:00.020294
Yolo inference 7th time: 0:00:00.019433
Yolo inference 8th time: 0:00:00.019503
Yolo inference 9th time: 0:00:00.019213
Yolo inference 10th time: 0:00:00.019322
Yolo inference 11th time: 0:00:00.018929
Yolo inference 12th time: 0:00:00.019141
Yolo inference 13th time: 0:00:00.019016
Yolo inference 14th time: 0:00:00.018798
Yolo inference 15th time: 0:00:00.017910
Yolo inference 16th time: 0:00:00.017716
Yolo inference 17th time: 0:00:00.017683
Yolo inference 18th time: 0:00:00.017766
Yolo inference 19th time: 0:00:00.017273
Yolo inference 20th time: 0:00:00.017226
Yolo inference 21th time: 0:00:00.017296
Yolo inference 22th time: 0:00:00.017268
Yolo inference 23th time: 0:00:00.017042
Yolo inference 24th time: 0:00:00.017454
...
Thanks.
1 Like
nikola2
October 16, 2023, 12:26pm
16
Hi, thank you for testing out my code. However I resolved this problem by switching to DeepStream and TRT, now inference time is stable.
system
Closed
November 6, 2023, 7:55am
18
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.