TF-TRT model very slow to load, with poor performance

Hi !

I am trying to run the latest models from Tensorflow Detection Zoo 2 on a Jetson Xavier NX models/ at master · tensorflow/models · GitHub

I tried adapt from this blog: Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation to run a MobilenetV2

I have 2 issues:

  • it takes about 25min to get the model ready to run in the inference script, I’d like to have this load time lower than 5min max
  • the FPS is really low (around 6FPS)

I downloaded ssd_mobilenet_v2_320x320_coco17_tpu-8 from models/ at master · tensorflow/models · GitHub, and unzipped it in ./coco_models

Here is my script to convert from TF to TF-TRT:

from pathlib import Path

from numpy import uint8
from numpy.random.mtrand import normal
from tensorflow.python.compiler.tensorrt.test.tf_trt_integration_test_base import FP16
from tensorflow.python.compiler.tensorrt.trt_convert import TrtConversionParams, TrtGraphConverterV2

def convert(model_directory: Path):
    converter = TrtGraphConverterV2(
        input_saved_model_dir=str(model_directory / "saved_model"),
        conversion_params=TrtConversionParams(precision_mode=FP16, max_workspace_size_bytes=1 << 32),

    def fake_inputs():
        yield normal(size=(1, 1_280, 720, 3)).astype(uint8), / "trt"))

if __name__ == "__main__":

And the script to infer from the model:

from time import time

import cv2
from numpy import expand_dims
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
from tensorflow.python.framework.ops import convert_to_tensor
from tensorflow.python.saved_model import saved_model
from tensorflow.python.saved_model.signature_constants import DEFAULT_SERVING_SIGNATURE_DEF_KEY
from tensorflow.python.saved_model.tag_constants import SERVING

saved_model_loaded = saved_model.load("coco_models/ssd_mobilenet_v2_320x320_coco17_tpu-8/trt", tags=[SERVING])
graph_func = saved_model_loaded.signatures[DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = convert_variables_to_constants_v2(graph_func)

def demo_object_detection():
    cap = cv2.VideoCapture(
        "nvarguscamerasrc ! "
        "video/x-raw(memory:NVMM), "
        f"width=(int)1280, height=(int)720, "
        "format=(string)NV12, framerate=(fraction)60/1 ! "
        "nvvidconv flip-method=0 ! "
        f"video/x-raw, width=(int)1280, height=(int)720, "
        "format=(string)BGRx ! "
        "videoconvert ! appsink",

    while True:
        ret, image =

        if ret is None:

        t = time()

        # Detection
        _, [boxes], [ids], _, [scores], *_ = (
            x.numpy() for x in frozen_func(convert_to_tensor(expand_dims(image, axis=0)))
        print(f"FPS: {1 / (time() - t):.1f}")

        # display
        for box, class_id, score in zip(boxes, ids, scores):
            if score > 0.5:
                        int(box[1] * image.shape[1]),
                        int(box[0] * image.shape[0]),
                        int(box[3] * image.shape[1]),
                        int(box[2] * image.shape[0]),
                    (1.0, 1.0, 1.0),
        cv2.imshow("object detection", image)

        if cv2.waitKey(1) & 0xFF == ord("q"):

    # When everything done, release the capture


if __name__ == "__main__":

Is there something that I’m doing wrongly ?

Many thanks !


You can serialize the TF-TRT model after the first launch.
So next time you can load the model with the file directly to save time.

For inference, this depends on the model architecture itself.
Is your model a modified version of ssd_mobilenet_v2?

If yes, you can inference it with pure TensorRT as below:
This will give you a much better performance.


Thanks a lot for your answer AastaLLL !

If I understand correctly, serializing the model would only improve the loading time, not the inference time, is it correct ?

If so, I’d like to go with pure TensorRT. I did successfully managed to convert a ssd_mobilenet_v2from Tensorflow 1 into a TensorRT model, and to run it at 60+ FPS.

However, I’m trying to run some models from the Tensorflow Zoo 2 (models/ at master · tensorflow/models · GitHub). My example is using ssd_mobilenet_v2, but some of the models (CenterNet and EfficientNet for eg) seems to be some very promising models, and I’d like to try them on the Jetson eventually. Those models are only available pretrained on TF2, so I need a way to convert a TF2 model to TensorRT.

Do you have some links or resources to achieve that ?

Many thanks !


Launch TensorRT with engine will only save the conversion time.
Inference time will be the same.

For TensorFlow v2.x based model, please first convert it into ONNX format.
There are some public tool can do this. ex. tf2onnx.

Once you get an ONNX model, you can launch TensorRT with the file directly.

/usr/src/tensorrt/bin/trtexec --onnx=[your/file/path]


Thank you for your response!I have the same problem. When I converted my model in trt.TrtPrecisionMode.FP32 mode and use tf.keras.models.load_model for the next using this model I obtained very slow loading time. Could you explain me how I can serialize the TF-TRT model after the first launch? Must I save it in using some special command/utils? For example, if I convert my TF TRTmodel to ONNX format will I able to use it in my scrip with quick loading?

Hi alex283hl,

Please help to open a new topic for your issue. Thanks