Hello,
I’m trying to reproduce NVIDIA benchmark with TensorRT Tiny-YOLOv3 (getting 1000 FPS) on a Jetson AGX Xavier target with the parameters below (i got only 700 FPS):
Power Mode : MAXN
Input resolution : 416x416
Precision Mode : INT8 (Calibration with 1000 images and IInt8EntropyCalibrator2 interface)
batch = 8
JetPack Version : 4.5.1
TensorRT version : 7.1.3
So first i generated the serialized graph model with format ONNX by following jkjung-avt steps and using the file yolov3_to_onnx.py which is contained on the /usr/src/tensorrt/samples/python/yolov3_onnx and then i’ve edited the script onnx_to_tensorrt.py to generate the TRTEngine and get the tiny-yolov3.trt to run the inference.
There is the tiny-yolov3.trt i’ve used to run the inference
#!/usr/bin/env python2
#
# Copyright 1993-2020 NVIDIA Corporation. All rights reserved.
#
# NOTICE TO LICENSEE:
#
# This source code and/or documentation ("Licensed Deliverables") are
# subject to NVIDIA intellectual property rights under U.S. and
# international Copyright laws.
#
# These Licensed Deliverables contained herein is PROPRIETARY and
# CONFIDENTIAL to NVIDIA and is being provided under the terms and
# conditions of a form of NVIDIA software license agreement by and
# between NVIDIA and Licensee ("License Agreement") or electronically
# accepted by Licensee. Notwithstanding any terms or conditions to
# the contrary in the License Agreement, reproduction or disclosure
# of the Licensed Deliverables to any third party without the express
# written consent of NVIDIA is prohibited.
#
# NOTWITHSTANDING ANY TERMS OR CONDITIONS TO THE CONTRARY IN THE
# LICENSE AGREEMENT, NVIDIA MAKES NO REPRESENTATION ABOUT THE
# SUITABILITY OF THESE LICENSED DELIVERABLES FOR ANY PURPOSE. IT IS
# PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND.
# NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THESE LICENSED
# DELIVERABLES, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY,
# NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.
# NOTWITHSTANDING ANY TERMS OR CONDITIONS TO THE CONTRARY IN THE
# LICENSE AGREEMENT, IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY
# SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THESE LICENSED DELIVERABLES.
#
# U.S. Government End Users. These Licensed Deliverables are a
# "commercial item" as that term is defined at 48 C.F.R. 2.101 (OCT
# 1995), consisting of "commercial computer software" and "commercial
# computer software documentation" as such terms are used in 48
# C.F.R. 12.212 (SEPT 1995) and is provided to the U.S. Government
# only as a commercial end item. Consistent with 48 C.F.R.12.212 and
# 48 C.F.R. 227.7202-1 through 227.7202-4 (JUNE 1995), all
# U.S. Government End Users acquire the Licensed Deliverables with
# only those rights set forth herein.
#
# Any use of the Licensed Deliverables in individual and commercial
# software must include, in the user documentation and internal
# comments to the code, the above Disclaimer and U.S. Government End
# Users Notice.
#
from __future__ import print_function
import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
from PIL import ImageDraw
from yolov3_to_onnx import download_file
from data_processing import PreprocessYOLO, PostprocessYOLO, ALL_CATEGORIES
import sys, os
sys.path.insert(1, os.path.join(sys.path[0], ".."))
import common
#print(sys.modules['common'])
import time
import argparse
import cv2
TRT_LOGGER = trt.Logger()
desc = ('This is an edited NVIDIA sample about how to implement YOLOv3 and Tiny-YOLOv3 with TensorRT'
',before executing this code, we have to execute yolov3_to_onnnx.py to parse the DarkNet model into ONNX model'
',after the generation of the serialized model.onnx, we can run this code and specify the parameters like the model, resolution...'
'For example to run a YOLOv3 model on the image dog.jpg with a 416x416 resolution and FP16 precision mode and a batch=1 we have to use this command : '
'=========================================================================== sudo python3 onnx_to_tensorrt.py -i dog -m yolov3 -r 416 -p FP16 -b 1')
parser = argparse.ArgumentParser(description=desc)
parser.add_argument('-i', '-input', '-image', help="Set the name of the input image", type=str)
parser.add_argument('-m', '-model', help="Set the name of the model you want to use \n <<yolov3>> to use YOLOv3 \n <<tiny>> to use Tiny-YOLOv3", type=str)
parser.add_argument('-r', '-resolution', help="Set the resolution of the input [608, 416 or 288]", type=str)
parser.add_argument('-p', '-precision', help="Set the precision mode [FP32, FP16 or INT8]", type=str)
parser.add_argument('-b', '-batch', help="Set The size of the batch", type=int)
args = parser.parse_args()
batch_size = args.b
class YOLOEntropyCalibrator(trt.IInt8EntropyCalibrator2):
"""YOLOEntropyCalibrator
This class implements TensorRT's IInt8EntropyCalibtrator2 interface.
It reads all images from the specified directory and generates INT8
calibration data for YOLO models accordingly.
"""
def __init__(self, img_dir, net_hw, cache_file, batch_size=1):
if not os.path.isdir(img_dir):
raise FileNotFoundError('%s does not exist' % img_dir)
if len(net_hw) != 2 or net_hw[0] % 32 or net_hw[1] % 32:
raise ValueError('bad net shape: %s' % str(net_hw))
super().__init__() # trt.IInt8EntropyCalibrator2.__init__(self)
self.img_dir = img_dir
self.net_hw = net_hw
self.cache_file = cache_file
self.batch_size = batch_size
self.blob_size = 3 * net_hw[0] * net_hw[1] * np.dtype('float32').itemsize * batch_size
self.jpgs = [f for f in os.listdir(img_dir) if f.endswith('.jpg')]
# The number "500" is NVIDIA's suggestion. See here:
# https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#optimizing_int8_c
if len(self.jpgs) < 500:
print('WARNING: found less than 500 images in %s!' % img_dir)
self.current_index = 0
# Allocate enough memory for a whole batch.
self.device_input = cuda.mem_alloc(self.blob_size)
def __del__(self):
del self.device_input # free CUDA memory
def get_batch_size(self):
return self.batch_size
def get_batch(self, names):
if self.current_index + self.batch_size > len(self.jpgs):
return None
current_batch = int(self.current_index / self.batch_size)
batch = []
for i in range(self.batch_size):
img_path = os.path.join(
self.img_dir, self.jpgs[self.current_index + i])
img = cv2.imread(img_path)
assert img is not None, 'failed to read %s' % img_path
batch.append(_preprocess_yolo(img, self.net_hw))
batch = np.stack(batch)
assert batch.nbytes == self.blob_size
cuda.memcpy_htod(self.device_input, np.ascontiguousarray(batch))
self.current_index += self.batch_size
return [self.device_input]
def read_calibration_cache(self):
# If there is a cache, use it instead of calibrating again.
# Otherwise, implicitly return None.
if os.path.exists(self.cache_file):
with open(self.cache_file, 'rb') as f:
return f.read()
def write_calibration_cache(self, cache):
with open(self.cache_file, 'wb') as f:
f.write(cache)
def draw_bboxes(image_raw, bboxes, confidences, categories, all_categories, bbox_color='blue'):
"""Draw the bounding boxes on the original input image and return it.
Keyword arguments:
image_raw -- a raw PIL Image
bboxes -- NumPy array containing the bounding box coordinates of N objects, with shape (N,4).
categories -- NumPy array containing the corresponding category for each object,
with shape (N,)
confidences -- NumPy array containing the corresponding confidence for each object,
with shape (N,)
all_categories -- a list of all categories in the correct ordered (required for looking up
the category name)
bbox_color -- an optional string specifying the color of the bounding boxes (default: 'blue')
"""
draw = ImageDraw.Draw(image_raw)
print(bboxes, confidences, categories)
for box, score, category in zip(bboxes, confidences, categories):
x_coord, y_coord, width, height = box
left = max(0, np.floor(x_coord + 0.5).astype(int))
top = max(0, np.floor(y_coord + 0.5).astype(int))
right = min(image_raw.width, np.floor(x_coord + width + 0.5).astype(int))
bottom = min(image_raw.height, np.floor(y_coord + height + 0.5).astype(int))
draw.rectangle(((left, top), (right, bottom)), outline=bbox_color)
draw.text((left, top - 12), '{0} {1:.2f}'.format(all_categories[category], score), fill=bbox_color)
return image_raw
def resolution_percisionMode_choice(model, resolution, precision, batch):
if resolution == "608":
input_resolution = (608, 608)
input_shape = [batch, 3, 608, 608]
output_shape = [(1, 255, 19, 19), (1, 255, 38, 38), (1, 255, 76, 76)]
if model == "yolov3":
path_onnx = "yolov3-608.onnx"
if batch == 1:
if precision == "FP32":
path_trt = "yolov3-608_FP32.trt"
elif precision == "FP16":
path_trt = "yolov3-608_FP16.trt"
else:
path_trt = "yolov3-608_INT8.trt"
else:
if precision == "FP32":
path_trt = "yolov3-608_FP32_b"+str(batch)+".trt"
elif precision == "FP16":
path_trt = "yolov3-608_FP16_b"+str(batch)+".trt"
else:
path_trt = "yolov3-608_INT8_b"+str(batch)+".trt"
else:
path_onnx = "yolov3-tiny-608.onnx"
if batch == 1:
if precision == "FP32":
path_trt = "yolov3-tiny-608_FP32.trt"
elif precision == "FP16":
path_trt = "yolov3-tiny-608_FP16.trt"
else:
path_trt = "yolov3-tiny-608_INT8.trt"
else:
if precision == "FP32":
path_trt = "yolov3-tiny-608_FP32_b"+str(batch)+".trt"
elif precision == "FP16":
path_trt = "yolov3-tiny-608_FP16_b"+str(batch)+".trt"
else:
path_trt = "yolov3-tiny-608_INT8_b"+str(batch)+".trt"
elif resolution == "416":
input_resolution = (416, 416)
input_shape = [batch, 3, 416, 416]
output_shape = [(1, 255, 13, 13), (1, 255, 26, 26), (1, 255, 52, 52)]
if model == "yolov3":
path_onnx = "yolov3-416.onnx"
if batch == 1:
if precision == "FP32":
path_trt = "yolov3-416_FP32.trt"
elif precision == "FP16":
path_trt = "yolov3-416_FP16.trt"
else:
path_trt = "yolov3-416_INT8.trt"
else:
if precision == "FP32":
path_trt = "yolov3-416_FP32_b"+str(batch)+".trt"
elif precision == "FP16":
path_trt = "yolov3-416_FP16_b"+str(batch)+".trt"
else:
path_trt = "yolov3-416_INT8_b"+str(batch)+".trt"
else:
path_onnx = "yolov3-tiny-416.onnx"
if batch == 1:
if precision == "FP32":
path_trt = "yolov3-tiny-416_FP32.trt"
elif precision == "FP16":
path_trt = "yolov3-tiny-416_FP16.trt"
else:
path_trt = "yolov3-tiny-416_INT8.trt"
else:
if precision == "FP32":
path_trt = "yolov3-tiny-416_FP32_b" + str(batch) + ".trt"
elif precision == "FP16":
path_trt = "yolov3-tiny-416_FP16_b" + str(batch) + ".trt"
else:
path_trt = "yolov3-tiny-416_INT8_b" + str(batch) + ".trt"
elif resolution == "288":
input_resolution = (288, 288)
input_shape = [batch, 3, 288, 288]
if model == "yolov3":
path_onnx = "yolov3-288.onnx"
if batch == 1:
if precision == "FP32":
path_trt = "yolov3-288_FP32.trt"
elif precision == "FP16":
path_trt = "yolov3-288_FP16.trt"
else:
path_trt = "yolov3-288_INT8.trt"
else:
if precision == "FP32":
path_trt = "yolov3-288_FP32_b" + str(batch) + ".trt"
elif precision == "FP16":
path_trt = "yolov3-288_FP16_b" + str(batch) + ".trt"
else:
path_trt = "yolov3-288_INT8_b" + str(batch) + ".trt"
else:
path_onnx = "yolov3-tiny-288.onnx"
if batch == 1:
if precision == "FP32":
path_trt = "yolov3-tiny-288_FP32.trt"
elif precision == "FP16":
path_trt = "yolov3-tiny-288_FP16.trt"
else:
path_trt = "yolov3-tiny-288_INT8.trt"
else:
if precision == "FP32":
path_trt = "yolov3-tiny-288_FP32_b" + str(batch) + ".trt"
elif precision == "FP16":
path_trt = "yolov3-tiny-288_FP16_b" + str(batch) + ".trt"
else:
path_trt = "yolov3-tiny-288_INT8_b" + str(batch) + ".trt"
else:
print("ERROR : The resolution can take only the following values : 608, 416 or 288, try again")
return input_resolution, input_shape, path_trt, path_onnx, output_shape
input_res, input_shape, path_trt, path_onnx, output_shape = resolution_percisionMode_choice(args.m, args.r, args.p, batch_size)
def get_engine(onnx_file_path, engine_file_path=""):
"""Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
def build_engine():
"""Takes an ONNX file and creates a TensorRT engine to run inference with"""
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(
common.EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
trt.init_libnvinfer_plugins(None, "")
builder.max_workspace_size = 1 << 28 # 256MiB
builder.max_batch_size = batch_size
if args.p == "FP16":
print("Using FP16 precision mode...")
builder.fp16_mode = True
if args.p == "INT8":
print("Using INT8 precision mode...")
builder.int8_mode = True
builder.int8_calibrator = YOLOEntropyCalibrator('calib_images', (416, 416), 'calib_yolov3-tiny-int8-416.bin')
# Parse model file
if not os.path.exists(onnx_file_path):
print(
'ONNX file {} not found, please run yolov3_to_onnx.py first to generate it.'.format(onnx_file_path))
exit(0)
print('Loading ONNX file from path {}...'.format(onnx_file_path))
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
if not parser.parse(model.read()):
print('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
# The actual yolov3.onnx is generated with batch size 64. Reshape input to batch size 1
network.get_input(0).shape = input_shape
print('Completed parsing of ONNX file')
print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
engine = builder.build_cuda_engine(network)
print("Completed creating Engine")
with open(engine_file_path, "wb") as f:
f.write(engine.serialize())
return engine
if os.path.exists(engine_file_path):
# If a serialized engine exists, use it instead of building an engine.
print("Reading engine from file {}".format(engine_file_path))
with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())
else:
return build_engine()
def main():
"""Create a TensorRT engine for ONNX-based YOLOv3-608 and run inference."""
# Try to load a previously generated YOLOv3-608 network graph in ONNX format:
onnx_file_path = path_onnx
engine_file_path = path_trt
# Download a dog image and save it to the following file path:
input_img = args.i + ".jpg"
input_image_path = input_img
# Two-dimensional tuple with the target network's (spatial) input resolution in HW ordered
input_resolution_yolov3_HW = input_res
# Create a pre-processor object by specifying the required input resolution for YOLOv3
preprocessor = PreprocessYOLO(input_resolution_yolov3_HW)
# Load an image from the specified input path, and return it together with a pre-processed version
image_raw, image = preprocessor.process(input_image_path)
# Store the shape of the original input image in WH format, we will need it for later
shape_orig_WH = image_raw.size
image = image.repeat(batch_size, axis=0)
# Output shapes expected by the post-processor
output_shapes = output_shape
# Do inference with TensorRT
trt_outputs = []
with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:
inputs, outputs, bindings, stream = common.allocate_buffers(engine)
# Do inference
print('Running inference on image {}...'.format(input_image_path))
# Set host input to the image. The common.do_inference function will copy the input to the GPU before executing.
inputs[0].host = image
trt_outputs = common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
nbr_frame = 100
counter = 0
sum_FPS = 0
while (counter < nbr_frame):
t0 = time.time()
trt_outputs = common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs,
stream=stream)
t1 = time.time()
counter += 1
FPS = batch_size / (t1 - t0)
sum_FPS = sum_FPS + FPS
AVG_FPS = sum_FPS / counter
print("Latency = {:.2f}ms | FPS = {:.2f} | AVG_FPS = {:.2f}".format(
(t1 - t0) * 1000, FPS, AVG_FPS))
# Before doing post-processing, we need to reshape the outputs as the common.do_inference will give us flat arrays.
trt_outputs = [output.reshape(shape) for output, shape in zip(trt_outputs, output_shapes)]
postprocessor_args = {"yolo_masks": [(6, 7, 8), (3, 4, 5), (0, 1, 2)],
# A list of 3 three-dimensional tuples for the YOLO masks
"yolo_anchors": [(10, 13), (16, 30), (33, 23), (30, 61), (62, 45),
# A list of 9 two-dimensional tuples for the YOLO anchors
(59, 119), (116, 90), (156, 198), (373, 326)],
"obj_threshold": 0.3, # Threshold for object coverage, float value between 0 and 1
"nms_threshold": 0.5,
# Threshold for non-max suppression algorithm, float value between 0 and 1
"yolo_input_resolution": input_resolution_yolov3_HW}
postprocessor = PostprocessYOLO(**postprocessor_args)
t2 = time.time()
# Run the post-processing algorithms on the TensorRT outputs and get the bounding box details of detected objects
boxes, classes, scores = postprocessor.process(trt_outputs, (shape_orig_WH))
t3 = time.time()
# Draw the bounding boxes onto the original input image and save it as a PNG file
obj_detected_img = draw_bboxes(image_raw, boxes, scores, classes, ALL_CATEGORIES)
t4 = time.time()
output_image_path = args.i+"_out_"+args.m+"_"+args.r+"_"+args.p+".png"
obj_detected_img.save(output_image_path, 'PNG')
print('Saved image with bounding boxes of detected objects to {}.'.format(output_image_path))
print("Latency = {:.2f}ms | FPS = {:.2f} | post processing = {:.2f}ms | drawing = {:.2f}ms".format((t1 - t0) * 1000,
1 / (t1 - t0),
(t3 - t2) * 1000,
(t4 - t3) * 1000))
if __name__ == '__main__':
main()
To execute the script with a batch of 1 i use this command :
sudo python3 onnx_to_tensorrt.py -i dog -m tiny -r 416 -p INT8 -b 1
OUTPUT:
Reading engine from file yolov3-tiny-416_INT8.trt Running inference on image dog.jpg... Latency = 1.86ms | FPS = 538.91 | AVG_FPS = 538.91 Latency = 1.82ms | FPS = 548.63 | AVG_FPS = 543.77 Latency = 1.80ms | FPS = 555.76 | AVG_FPS = 547.77 Latency = 1.85ms | FPS = 539.11 | AVG_FPS = 545.60 Latency = 1.99ms | FPS = 502.73 | AVG_FPS = 537.03 Latency = 2.24ms | FPS = 447.30 | AVG_FPS = 522.07 Latency = 2.09ms | FPS = 478.42 | AVG_FPS = 515.84 Latency = 1.90ms | FPS = 527.06 | AVG_FPS = 517.24 Latency = 1.81ms | FPS = 551.59 | AVG_FPS = 521.06 Latency = 1.84ms | FPS = 544.64 | AVG_FPS = 523.42 Latency = 1.81ms | FPS = 553.05 | AVG_FPS = 526.11 Latency = 2.02ms | FPS = 494.67 | AVG_FPS = 523.49 Latency = 1.78ms | FPS = 560.89 | AVG_FPS = 526.37 Latency = 2.01ms | FPS = 496.66 | AVG_FPS = 524.24 Latency = 2.62ms | FPS = 382.10 | AVG_FPS = 514.77 Latency = 2.38ms | FPS = 420.06 | AVG_FPS = 508.85 Latency = 2.24ms | FPS = 446.96 | AVG_FPS = 505.21 Latency = 1.83ms | FPS = 545.71 | AVG_FPS = 507.46 Latency = 1.79ms | FPS = 559.54 | AVG_FPS = 510.20 Latency = 1.98ms | FPS = 505.16 | AVG_FPS = 509.95 Latency = 1.80ms | FPS = 555.91 | AVG_FPS = 512.14 Latency = 1.75ms | FPS = 570.65 | AVG_FPS = 514.80 Latency = 1.77ms | FPS = 565.12 | AVG_FPS = 516.98 Latency = 1.78ms | FPS = 562.47 | AVG_FPS = 518.88 Latency = 1.88ms | FPS = 532.20 | AVG_FPS = 519.41 Latency = 1.76ms | FPS = 568.72 | AVG_FPS = 521.31 Latency = 1.76ms | FPS = 568.64 | AVG_FPS = 523.06 Latency = 1.83ms | FPS = 545.78 | AVG_FPS = 523.87 Latency = 2.75ms | FPS = 364.06 | AVG_FPS = 518.36 Latency = 2.44ms | FPS = 410.24 | AVG_FPS = 514.76 Latency = 2.15ms | FPS = 465.00 | AVG_FPS = 513.15 Latency = 1.83ms | FPS = 547.34 | AVG_FPS = 514.22 Latency = 1.82ms | FPS = 550.07 | AVG_FPS = 515.31 Latency = 1.80ms | FPS = 556.86 | AVG_FPS = 516.53 Latency = 1.78ms | FPS = 560.66 | AVG_FPS = 517.79 Latency = 1.85ms | FPS = 540.85 | AVG_FPS = 518.43 Latency = 1.97ms | FPS = 506.93 | AVG_FPS = 518.12 Latency = 1.77ms | FPS = 563.52 | AVG_FPS = 519.31 Latency = 1.81ms | FPS = 551.74 | AVG_FPS = 520.15 Latency = 1.80ms | FPS = 555.76 | AVG_FPS = 521.04 Latency = 1.83ms | FPS = 547.77 | AVG_FPS = 521.69 Latency = 1.77ms | FPS = 563.67 | AVG_FPS = 522.69 Latency = 1.85ms | FPS = 541.62 | AVG_FPS = 523.13 Latency = 2.17ms | FPS = 460.20 | AVG_FPS = 521.70 Latency = 2.81ms | FPS = 355.30 | AVG_FPS = 518.00 Latency = 2.16ms | FPS = 462.28 | AVG_FPS = 516.79 Latency = 1.87ms | FPS = 535.81 | AVG_FPS = 517.19 Latency = 1.81ms | FPS = 552.83 | AVG_FPS = 517.94 Latency = 1.81ms | FPS = 551.59 | AVG_FPS = 518.62 Latency = 1.81ms | FPS = 551.88 | AVG_FPS = 519.29 Latency = 1.78ms | FPS = 560.59 | AVG_FPS = 520.10 Latency = 1.79ms | FPS = 558.94 | AVG_FPS = 520.85 Latency = 1.96ms | FPS = 510.63 | AVG_FPS = 520.65 Latency = 1.82ms | FPS = 550.22 | AVG_FPS = 521.20 Latency = 1.79ms | FPS = 559.54 | AVG_FPS = 521.90 Latency = 1.78ms | FPS = 560.59 | AVG_FPS = 522.59 Latency = 1.83ms | FPS = 547.70 | AVG_FPS = 523.03 Latency = 1.76ms | FPS = 566.72 | AVG_FPS = 523.78 Latency = 1.84ms | FPS = 543.02 | AVG_FPS = 524.11 Latency = 2.64ms | FPS = 378.51 | AVG_FPS = 521.68 Latency = 2.40ms | FPS = 417.14 | AVG_FPS = 519.97 Latency = 2.14ms | FPS = 467.49 | AVG_FPS = 519.12 Latency = 1.82ms | FPS = 550.22 | AVG_FPS = 519.61 Latency = 1.80ms | FPS = 554.44 | AVG_FPS = 520.16 Latency = 1.81ms | FPS = 552.61 | AVG_FPS = 520.66 Latency = 1.78ms | FPS = 561.79 | AVG_FPS = 521.28 Latency = 1.80ms | FPS = 556.79 | AVG_FPS = 521.81 Latency = 1.79ms | FPS = 559.54 | AVG_FPS = 522.37 Latency = 1.79ms | FPS = 559.84 | AVG_FPS = 522.91 Latency = 2.11ms | FPS = 474.42 | AVG_FPS = 522.22 Latency = 1.79ms | FPS = 558.05 | AVG_FPS = 522.72 Latency = 1.82ms | FPS = 550.36 | AVG_FPS = 523.10 Latency = 1.76ms | FPS = 568.95 | AVG_FPS = 523.73 Latency = 1.77ms | FPS = 564.59 | AVG_FPS = 524.28 Latency = 2.22ms | FPS = 450.42 | AVG_FPS = 523.30 Latency = 2.69ms | FPS = 371.44 | AVG_FPS = 521.30 Latency = 2.23ms | FPS = 448.88 | AVG_FPS = 520.36 Latency = 2.27ms | FPS = 439.79 | AVG_FPS = 519.33 Latency = 1.92ms | FPS = 520.90 | AVG_FPS = 519.35 Latency = 1.95ms | FPS = 512.44 | AVG_FPS = 519.26 Latency = 1.82ms | FPS = 548.63 | AVG_FPS = 519.62 Latency = 1.88ms | FPS = 530.66 | AVG_FPS = 519.76 Latency = 1.83ms | FPS = 546.70 | AVG_FPS = 520.08 Latency = 1.83ms | FPS = 547.56 | AVG_FPS = 520.41 Latency = 1.83ms | FPS = 545.35 | AVG_FPS = 520.70 Latency = 1.80ms | FPS = 556.20 | AVG_FPS = 521.12 Latency = 1.80ms | FPS = 555.39 | AVG_FPS = 521.51 Latency = 1.84ms | FPS = 544.86 | AVG_FPS = 521.78 Latency = 1.80ms | FPS = 556.86 | AVG_FPS = 522.17 Latency = 1.84ms | FPS = 543.73 | AVG_FPS = 522.41 Latency = 2.68ms | FPS = 373.19 | AVG_FPS = 520.77 Latency = 2.48ms | FPS = 402.60 | AVG_FPS = 519.49 Latency = 2.25ms | FPS = 444.92 | AVG_FPS = 518.68 Latency = 2.10ms | FPS = 475.81 | AVG_FPS = 518.23 Latency = 1.86ms | FPS = 536.22 | AVG_FPS = 518.42 Latency = 1.78ms | FPS = 561.56 | AVG_FPS = 518.87 Latency = 1.81ms | FPS = 551.74 | AVG_FPS = 519.21 Latency = 1.86ms | FPS = 538.77 | AVG_FPS = 519.41 Latency = 1.82ms | FPS = 550.22 | AVG_FPS = 519.72 Latency = 1.83ms | FPS = 547.56 | AVG_FPS = 519.99 [[108.28476079 188.27050212 286.63875325 355.7010887 ] [202.18680365 171.43988028 386.73458023 301.09193915] [429.43957671 76.93088856 286.18962092 93.58978557] [504.3033231 62.065803 146.39240185 126.5069848 ]] [0.81173893 0.3637773 0.79401759 0.74956349] [16 1 2 2] Saved image with bounding boxes of detected objects to dog_out_tiny_416_INT8.png. Latency = 1.83ms | FPS = 547.56 | post processing = 184.86ms | drawing = 8.53ms
OUTPUT Image:
IMPORTANT: With a batch size of 8 i got 700 FPS (Latency = 11.32ms) but with a batch >= 16 the FPS decrease, i didn’t add post-processing and drawing box latency to calculate the latency of the inference (i only took into consideration the do_inference_v2() latency)
Had i done something wrong ? how can i increase the FPS to 1000 ?