Using transfer learning as described in the tutorial of the Tensorflow Object Detection API, I trained an object detector. The model that was used is the same as in the tutorial (SSD ResNet50 V1 FPN 640x640 from the TF model zoo).
The detector itself works fine, but when converting it to TensorRT on the Jetson AGX Xavier, it runs with the same number of frames/s as prior to conversion (FPS = 18 for fp16 and FPS = 8 for fp32). I guess this is unexpectedly slow, since I read about speed up factors of about 2-3, and the model zoo states 46 ms. Here is what I did:
First, I trained the ResNet50 model using Tensorflow 2.3 on another machine and saved it in the usual SavedModel format. Then, I copied the SavedModel to the Xavier, where I converted it to TensorRT using Tensorflow 2.4 and the code below.
Are there any further steps I could take?
import os
import time
import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt
precision_mode = trt.TrtPrecisionMode.FP16
# Prepare paths
path_models = 'models'
path_load = os.path.join(path_models, 'exported-models/detector_ssd-resnet50', 'saved_model')
path_save_unbuilt = os.path.join(path_models, 'tensorrt/unbuilt', 'detector')
path_save_prebuilt = os.path.join(path_models, 'tensorrt/prebuilt', 'detector')
os.makedirs(path_save_unbuilt, exist_ok=True)
os.makedirs(path_save_prebuilt, exist_ok=True)
def generator_expected_shapes():
"""
As understood from the docs:
This generator should yield tensors in all variations as expected regarding the shape.
Here we expect a batch of one sample (the image) with size 640x640.
"""
shape_variations = [[(1, 640, 640, 3)],]
for shapes in shape_variations:
yield [np.zeros(shape).astype(np.uint8) for shape in shapes]
# Conversion Parameters
conversion_params = trt.TrtConversionParams(
precision_mode=precision_mode)
# Convert
converter = trt.TrtGraphConverterV2(
input_saved_model_dir=path_load,
conversion_params=conversion_params)
converter.convert()
# Save one version that is not built yet and to be built at runtime and one that is prebuilt for current GPU
converter.save(path_save_unbuilt)
converter.build(input_fn=generator_expected_shapes)
converter.save(path_save_prebuilt)
# Create pseudo-data
data = tf.convert_to_tensor(np.zeros((1, 640, 640, 3)), dtype=tf.uint8)
# Load converted model
root = tf.saved_model.load(path_save_prebuilt)
# Measure FPS
t = []
for _ in range(100):
_ = root.signatures['serving_default'](data)
t.append(time.time())
fps_per_interval = 1 / np.diff(t)
print('FPS: {} +/- {}'.format(np.mean(fps_per_interval), np.std(fps_per_interval)))