Save serialized TF-TRT engine to reuse in Deepstream


Based on the TF-TRT documentation ( I converted my custom model to TensorRT. The code can be seen below. The conversion succeeded and I can run inference using my model at nearly 2x the speed. The next step would be to serialize and save the converted graph as a TRT engine to later include it in my Deepstream application. In the documentation I cannot find any information regarding this step. I don’t want to create my tensorrt engine during runtime.
Simply said, I want to produce the same output as I would get by using trtexec on a onnx model and using the saveEngine flag.
Is this possible using TF-TRT?

trtexec --onnx=/model.onnx --saveEngine= trt.engine


with tf.Session() as sess:
    print(f'output nodes: {output_nodes}')
    with tf.gfile.GFile(frozen_graph_name, 'rb') as f:
        frozen_graph = tf.GraphDef()

    if precision_mode is not None:
        converter = trt.TrtGraphConverter(
        frozen_graph = converter.convert()
        # save this frozen graph to disk to reuse later


TensorRT Version: 7.1.3
GPU Type: agx xavier
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 1.15
Baremetal or Container (if container which image + tag): baremetal

Hi @blubthefish,
I might not have got your question correct, but you are using saveEngine flag to save your trt engine, which you can use later using loadEngine flag.
something like trtexec --loadEngine=yourEngine.engine --batch=1

Thanks for the reply @AakankshaS but I meant that I wanted to do something similar like the trtexec command by using tf-trt.

In the end I managed to convert my Deeplabv3+/Mobilenetv3 model by using tf2onnx and then using trtexec to convert the onnx model to a trt engine.
It needed 3 adaptions in case somebody is trying to do the same:

  1. Fixed input dimensions of the neural network
  2. Opset of at least 10 in the TF->Onnx step to support mobilenetv3 layers
  3. Change uint8 layers to another datatype using graphsurgeon because uint8/16 is not supported for the onnx->tensorrt conversion
1 Like


Could you please share the .pb and .onnx files if possible or how exactly did you succeed to create tensorRT engine.

I appropriate your help

I cannot share my model files but I’m using these 2 commands for converting my models:

python3 -m tf2onnx.convert --graphdef frozen_model.pb --output model.onnx --inputs ImageTensor:0 --outputs SemanticPredictions:0 --opset 11 --fold_const

/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --explicitBatch --saveEngine=model.engine --workspace=28000 --fp16

If you get some errors like non supported datatype (In my case my model had some uint8 layers) you need to manually change that layers to something like int32 for example.

Thank you for the fast response,

I followed your steps and I got the following error when I tried to build the engine

I converted uint8 to int32 using the following code:

import onnx_graphsurgeon as gs
import onnx
import numpy as np

graph = gs.import_onnx(onnx.load("deeplabv3.onnx"))
for inp in graph.inputs:
    inp.dtype = np.int32, "deeplabv3_int32.onnx")

Did you try the onnx model check function to see if your model is valid?
I see from your model filename that you try to use deeplabv3. Are you using the tensorflow research repository?
Instead of graphsurgeon you could just define your inputs in the script from here:

If you check line 91:

input_image = tf.placeholder(tf.uint8, [1, None, None, 3], name=_INPUT_NAME)

You could change to something like this:

input_image = tf.placeholder(
tf.float32, [1, 3, 1024, 1024], name=_INPUT_NAME)

-> fixed dimensions + supported datatype

You might wonder why I changed from [1, height, width, 3] to [1, 3, height, width] dimensions for the placeholder.
It’s because deepstream expects that input dimension format.
To make it match with the tensorflow research repo trained model you should do a transpose after that. The changes look something like this:

input_image = tf.placeholder(
tf.float32, [1, 3, 1024, 1024], name=_INPUT_NAME)

original_image_size = tf.shape(input_image)[2:4]

# Squeeze the dimension in axis=0 since `preprocess_image_and_label` assumes

# image to be 3-D.

# deepstream compatibility edit

input_image = tf.transpose(input_image, (0, 2, 3, 1))

image = tf.squeeze(input_image, axis=0)

You can compare with the original code from the link above.
Hope this helps you, it took me some time to make it work :)

I tried to follow your steps and this error has occurred when I tried to build the model:

please find the modified attached (2.8 KB)

Hmm looks like you convert the model on a RTX 2080ti, 11GB should be enough memory… not sure what’s happening. I’m running the TensorRT conversion on a Jetson AGX Xavier with 32GB memory though. I think the error is not related to the export model script. Googling I found some posts that it might be related to the TensorRT version. Maybe try using a newer/older version or googling yourself for more information.