TF-TRT does not speed up the model

Using transfer learning as described in the tutorial of the Tensorflow Object Detection API, I trained an object detector. The model that was used is the same as in the tutorial (SSD ResNet50 V1 FPN 640x640 from the TF model zoo).

The detector itself works fine, but when converting it to TensorRT on the Jetson AGX Xavier, it runs with the same number of frames/s as prior to conversion (FPS = 18 for fp16 and FPS = 8 for fp32). I guess this is unexpectedly slow, since I read about speed up factors of about 2-3, and the model zoo states 46 ms. Here is what I did:

First, I trained the ResNet50 model using Tensorflow 2.3 on another machine and saved it in the usual SavedModel format. Then, I copied the SavedModel to the Xavier, where I converted it to TensorRT using Tensorflow 2.4 and the code below.

Are there any further steps I could take?

import os
import time
import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt

precision_mode = trt.TrtPrecisionMode.FP16

# Prepare paths
path_models = 'models'
path_load = os.path.join(path_models, 'exported-models/detector_ssd-resnet50', 'saved_model')
path_save_unbuilt = os.path.join(path_models, 'tensorrt/unbuilt', 'detector')
path_save_prebuilt = os.path.join(path_models, 'tensorrt/prebuilt', 'detector')
os.makedirs(path_save_unbuilt, exist_ok=True)
os.makedirs(path_save_prebuilt, exist_ok=True)

def generator_expected_shapes():
    As understood from the docs:
    This generator should yield tensors in all variations as expected regarding the shape.
    Here we expect a batch of one sample (the image) with size 640x640.
    shape_variations = [[(1, 640, 640, 3)],]
    for shapes in shape_variations:
        yield [np.zeros(shape).astype(np.uint8) for shape in shapes]

# Conversion Parameters
conversion_params = trt.TrtConversionParams(

# Convert
converter = trt.TrtGraphConverterV2(

# Save one version that is not built yet and to be built at runtime and one that is prebuilt for current GPU

# Create pseudo-data
data = tf.convert_to_tensor(np.zeros((1, 640, 640, 3)), dtype=tf.uint8)

# Load converted model
root = tf.saved_model.load(path_save_prebuilt)

# Measure FPS
t = []
for _ in range(100):
    _ = root.signatures['serving_default'](data)
fps_per_interval = 1 / np.diff(t)
print('FPS: {} +/- {}'.format(np.mean(fps_per_interval), np.std(fps_per_interval)))


We recommend to deploy your model with pure TensorRT rather than TF-TRT.
For TensorFlow v2.x model, please convert it via TensorFlow->ONNX->TensorRT.


Thanks for the reply :)

The step from ONNX to TensorRT still fails, because TensorRT does not implement the Non-Max-Suppression layer.

I would like to split the ONNX model before the NMS into model base and postprocessing, then convert only the model base to TensorRT. Are there any tools you can recommend for that? So far I looked into the Python packages onnx, onnxruntime and sclblonnx, but not sure, how I could extract parts from a graph.