Slow inference on images

nikola2 · August 30, 2023, 1:26pm

Hi,

Inference of my YoloV4 model is slow (~1s)
What I’m doing is testing inference with random numpy array and images, for example:

warmups = 100
# test on numpy array
for i in range(warmups):
    crop = np.random.randint(256, size=(320, 320, 3))
    crop = crop / 255.

    crop = np.asarray([crop]).astype(np.float32)

    batch_data = tf.constant(crop)

    inference_start_time = time.time()
    rez = model_signature(batch_data)
    inference_end_time = time.time()
    print('Yolo inference ' + str(i) +  'th time: ' + str(inference_end_time - inference_start_time))

# test on images
for path in image_paths:

    img_ori = cv2.imread(image_dir + os.path.sep + path)
    num_frames_processed += 1
    image_data = cv2.resize(img_ori, (320, 320))
    image_data = image_data / 255.

    images_data = np.asarray([image_data]).astype(np.float32)

    batch_data = tf.constant(images_data)

    yolo_inference_start_time = time.time()
    model_signature(batch_data)
    yolo_inference_end_time = time.time()
    print('Yolo inference time: ', yolo_inference_end_time - yolo_inference_start_time)

When I test my model on random numpy array, inference speed is ~8ms

Yolo inference 0th time: 331.1708495616913
Yolo inference 1th time: 0.009318113327026367
Yolo inference 2th time: 0.00907754898071289
Yolo inference 3th time: 0.008883953094482422
Yolo inference 4th time: 0.008752107620239258
Yolo inference 5th time: 0.008726835250854492
Yolo inference 6th time: 0.00879526138305664
Yolo inference 7th time: 0.008597135543823242
Yolo inference 8th time: 0.00872182846069336
Yolo inference 9th time: 0.008661270141601562

But when I try to loading images and doing inference, speed is 1s.

Yolo inference time: 1.6193459033966064
Yolo inference time: 1.6982169151306152
Yolo inference time: 1.6127874851226807
Yolo inference time: 1.6061813831329346
Yolo inference time: 1.6975934505462646
Yolo inference time: 1.5988457202911377
Yolo inference time: 1.5998070240020752
Yolo inference time: 1.6140410900115967
Yolo inference time: 1.6149139404296875
Yolo inference time: 1.6045515537261963

What could be the reason for this behavior?

AastaLLL · August 31, 2023, 5:38am

Hi,

Could you check the size and dtype of crop and image_data first?
More, does your TensorFlow apply inference with GPU or CPU?

Thanks.

nikola2 · August 31, 2023, 8:23am

Hello,

size of crop and is 1228960, dtype is float32.

Size of images_data is 1228960, dtype is float32.
(When I read image for the first time, size is 6220944, dtype is float64).

I haven’t called set_memory_growth, but observing output with jtop I see GPU is being utilized.

nikola2 · August 31, 2023, 10:39am

Hello,

to add to the conversation, as an experiment I added simple for loop within existing for loop for reading images:

    for i in range(10):
        yolo_inference_start_time = time.time()
        model_signature(batch_data)
        yolo_inference_end_time = time.time()
        print('Yolo inference ' + str(i) +  'th time: ' + str(yolo_inference_end_time - yolo_inference_start_time))

This is the output:

Yolo inference 0th time: 1.657867670059204
Yolo inference 1th time: 0.009186506271362305
Yolo inference 2th time: 0.008753538131713867
Yolo inference 3th time: 0.008697271347045898
Yolo inference 4th time: 0.008661746978759766
Yolo inference 5th time: 0.008554935455322266
Yolo inference 6th time: 0.008575677871704102
Yolo inference 7th time: 0.008648395538330078
Yolo inference 8th time: 0.008580207824707031
Yolo inference 9th time: 0.008511066436767578
Yolo inference 0th time: 1.6211886405944824
Yolo inference 1th time: 0.00919961929321289
Yolo inference 2th time: 0.008706808090209961
Yolo inference 3th time: 0.008604764938354492
Yolo inference 4th time: 0.00857996940612793
Yolo inference 5th time: 0.008562088012695312
Yolo inference 6th time: 0.008592367172241211
Yolo inference 7th time: 0.008548498153686523
Yolo inference 8th time: 0.008559465408325195
Yolo inference 9th time: 0.008565664291381836
Yolo inference 0th time: 1.9278879165649414
Yolo inference 1th time: 0.009716272354125977
Yolo inference 2th time: 0.008942127227783203

It seems first time inference is slow, but it is faster when doing inference on same image again.

Also, since I will have to work with RTSP stream at some point, I tried loading frame by frame and doing inference on them.

while(cap.isOpened()):
    ret, frame = cap.read()
    image_data = cv2.resize(frame, (320, 320))
    image_data = image_data / 255.

    images_data = np.asarray([image_data]).astype(np.float32)
    batch_data = tf.constant(images_data)
    yolo_inference_start_time = time.time()
    model_signature(batch_data)
    yolo_inference_end_time = time.time()
    print('Yolo inference time: ', yolo_inference_end_time - yolo_inference_start_time)
    total_times.append(yolo_inference_end_time - yolo_inference_start_time)

Here, inference results vary from 2s to 8ms:

Yolo inference time: 7.396626949310303
Yolo inference time: 1.9400830268859863
Yolo inference time: 0.009636878967285156
Yolo inference time: 0.008914470672607422
Yolo inference time: 0.008938074111938477
Yolo inference time: 0.009112834930419922
Yolo inference time: 0.008854150772094727
Yolo inference time: 1.7614493370056152
Yolo inference time: 0.00947427749633789
Yolo inference time: 1.6145720481872559
Yolo inference time: 0.009255170822143555
Yolo inference time: 0.008889436721801758
Yolo inference time: 0.008978843688964844
Yolo inference time: 0.009033918380737305
Yolo inference time: 0.008960723876953125
Yolo inference time: 1.6211168766021729
Yolo inference time: 1.7184834480285645
Yolo inference time: 0.009260892868041992
Yolo inference time: 0.009219646453857422
Yolo inference time: 0.009096860885620117
Yolo inference time: 0.009061098098754883
Yolo inference time: 0.008889198303222656
Yolo inference time: 0.008873224258422852
Yolo inference time: 0.008953332901000977

AastaLLL · September 1, 2023, 4:23am

Hi,

Would you mind profiling the image version inference with the Nsight System?

It looks like there is some underlying preprocessing when TensorFlow meets the image data for the first time.
Or is it possible to use the same data buffer for OpenCV to read the image?

Thanks.

nikola2 · September 4, 2023, 10:39am

Hi,

I generated a report for my script, but I’m having a hard time identifying the problem, could you please check it out or give me some guidelines?

The report is on this link.

Or is it possible to use the same data buffer for OpenCV to read the image?

I’m not sure I understood your proposal, can you better explain it, or share some example?

AastaLLL · September 5, 2023, 6:31am

Hi,

For example, you preallocate a buffer.
And every time read the image data to the same buffer?

Thanks.

nikola2 · September 5, 2023, 10:02am

Hi again,

so I tried creating buffer and then send preprocessed images to the same object, like this:

buff = np.random.randint(256, size=(320, 320, 3)).astype(np.float32)
buff = tf.constant([buff])

for path in image_paths:

    img_ori = cv2.imread(image_dir + os.path.sep + path)
    image_data = cv2.resize(img_ori, (320, 320))
    image_data = image_data / 255.

    images_data = np.asarray([image_data]).astype(np.float32)
    buff = tf.constant(images_data)
    model_signature(buff)

But there’s no change in the inference speed.

nikola2 · September 5, 2023, 12:54pm

Hi,

I tried setting perf_event_paranoid to collect CPU samples, but I haven’t noticed anything out of the ordinary. I’ve also uploaded the new report, could you be so kind to check if I’m missing somthing?
newest_nsight_report.nsys-rep (74.3 MB)

Also, I have this warning:

	Analysis	323575	00:14.920	CUDA profiling might have not been started correctly.
	Analysis	323575	00:14.920	No CUDA events collected. Does the process use CUDA?

and

	Analysis	323463	00:14.920	No cuDNN events collected. Does the process use cuDNN?
	Analysis	323486	00:14.920	cuDNN profiling might have not been started correctly.

Is this expected behavior for this version?

Thanks.

nikola2 · September 8, 2023, 8:15am

Hi,

hope you are doing well.

What further steps can I take to resolve this urgent issue?

And did you manage to take a look at my nsight report?

Thanks

AastaLLL · September 13, 2023, 6:17am

Hi,

Sorry for the late update.

Do you have a reproducible source can share with us?
We need to check it further for the suggestion.

Thanks.

nikola2 · September 13, 2023, 9:07am

Hi,

here’s the source code you can try. I’ve included couple of Yolo models on this link.

import os
import time
from datetime import datetime

import cv2
import numpy as np
import tensorflow as tf


YOLO_TRT_MODEL_DIR = "weights/tftrt_saved_model_2"
# "weights/tftrt_saved_model_fp16"
# "weights/yolov4-trt-int8-320"
# "weights/yolov4_trt_convert"

WARMUPS = 50
input_size = (320, 320, 3)

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

model = tf.saved_model.load(YOLO_TRT_MODEL_DIR)
model_signature = model.signatures['serving_default']

crop = np.random.randint(256, size=input_size)
crop = crop / 255.

crop = np.asarray([crop]).astype(np.float32)

batch_data = tf.constant(crop)

for i in range(WARMUPS):
    inference_start_time = datetime.today()
    rez = model_signature(batch_data)
    inference_end_time = datetime.today()
    print('Yolo inference ' + str(i) +  'th time: ' + str(inference_end_time - inference_start_time))

image_dir = "./images/"
image_paths = sorted(os.listdir(image_dir))

for_start_time = time.time()
num_frames_processed = 0

for path in image_paths:

    img_ori = cv2.imread(image_dir + os.path.sep + path)

    image_data = cv2.resize(img_ori, (320, 320))
    image_data = image_data / 255.

    images_data = np.asarray([image_data]).astype(np.float32)
    batch_data = tf.constant(images_data)
    
    num_frames_processed += 1
    yolo_inference_start_time = time.time()
    model_signature(batch_data)
    yolo_inference_end_time = time.time()
    print('Yolo inference time: ', yolo_inference_end_time - yolo_inference_start_time)

for_end_time = time.time()
elapsed = for_end_time - for_start_time
print('Total inference time: ', elapsed)
fps = num_frames_processed/elapsed
print("FPS: {} , Elapsed Time: {} , Frames Processed: {}".format(fps, elapsed, num_frames_processed))

AastaLLL · September 28, 2023, 5:30am

Hi,

Thanks for sharing the source.

We are checking this issue in deep.
Will get back to you later.

AastaLLL · October 13, 2023, 5:41am

Hi,

Thanks for your patience.

We tried to reproduce this issue on Orin with JetPack 5.1.2 and TensorFlow 2.12.0+nv23.06.
But the inference looks stable. Do we miss anything?

Yolo inference 0th time: 0:05:39.952503
Yolo inference 1th time: 0:00:00.021688
Yolo inference 2th time: 0:00:00.021798
Yolo inference 3th time: 0:00:00.020233
Yolo inference 4th time: 0:00:00.019178
Yolo inference 5th time: 0:00:00.028624
Yolo inference 6th time: 0:00:00.020294
Yolo inference 7th time: 0:00:00.019433
Yolo inference 8th time: 0:00:00.019503
Yolo inference 9th time: 0:00:00.019213
Yolo inference 10th time: 0:00:00.019322
Yolo inference 11th time: 0:00:00.018929
Yolo inference 12th time: 0:00:00.019141
Yolo inference 13th time: 0:00:00.019016
Yolo inference 14th time: 0:00:00.018798
Yolo inference 15th time: 0:00:00.017910
Yolo inference 16th time: 0:00:00.017716
Yolo inference 17th time: 0:00:00.017683
Yolo inference 18th time: 0:00:00.017766
Yolo inference 19th time: 0:00:00.017273
Yolo inference 20th time: 0:00:00.017226
Yolo inference 21th time: 0:00:00.017296
Yolo inference 22th time: 0:00:00.017268
Yolo inference 23th time: 0:00:00.017042
Yolo inference 24th time: 0:00:00.017454
...

Thanks.

nikola2 · October 16, 2023, 12:26pm

Hi, thank you for testing out my code. However I resolved this problem by switching to DeepStream and TRT, now inference time is stable.

system · November 6, 2023, 7:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Python wrapper for tensorrt implementation of Yolo (currently v2) Jetson Nano	32	8049	July 2, 2020
Slow inference with yolov8 pytorch on agx orin Jetson AGX Orin yolo	7	2074	September 5, 2023
Python sample yolov3 app on tensorrt Jetson Xavier NX tensorrt , yolo , python	9	1693	October 18, 2021
Custom Yolo Postprocessing Probe gets slower after 1-2 days of execution DeepStream SDK	2	372	June 9, 2023
Yolov8 model latency on jetson orin nx Jetson Orin NX yolo	17	135	May 21, 2025
Doing inference in python with YOLO V4 in TensorRT - postporsessing TAO Toolkit yolo	7	3356	October 12, 2021
Detector1 --> cropped images --> detector 2 Application cascading in the latest back-to-back DeepStream SDK nvbugs	21	1478	October 12, 2021
Inferring Yolo_v3.trt model in python TAO Toolkit tensorrt	38	3397	October 12, 2021
Get wrong infer results while testing yolov4 on deepstream 5.0 DeepStream SDK	46	9403	October 12, 2021
Yolo for Jetson DeepStream SDK	41	6331	August 13, 2024

Slow inference on images

Related topics