Deserializing TensorRT plan with the C++ API


I have a TensorFlow model saved as checkpoint files. My goal is to optimize the model in a TensorRT plan file, load it in my C++ program, and from there feed images to the network and obtain the outputs (masks).

With the help of the TF-TRT user guide, I created a frozen graph from my checkpoint files (user guide section 2.2.3). I then used create_inference_graph() with default parameters to obtain a TensorRT graph.

After that, I serialized the TRT graph to a plan file as indicated in section 2.10. One problem I notice there is that using the code from section 2.10 “as is” creates individual plan files for every optimized layer of my network. Also, non-optimized nodes are ignored. Therefore I modified the loop from section 2.10 to include all nodes and produce a single plan. This is my modification:

with tf.gfile.GFile("my_network.plan" % ("/", "_")), 'wb') as f:
    for n in trt_graph.node:
        print("Node: %s, %s" % (n.op,"/", "_")))

Once this is done, I now move to my C++ program where I can deserialize the plan:

IRuntime* runtime = createInferRuntime(gLogger);
ICudaEngine* engine = runtime->deserializeCudaEngine(modelData, modelSize, nullptr);

I have 3 questions:

  1. Can I put all nodes of my neural network into a single plan file, as shown in the first code snippet? Or should the plan file contain only optimized operations?
  2. Why does the ICudaEngine (obtained in the 2nd code snippet) only shows 2 layers with getNbLayers()? I understand layers can be fused during optimization, but my original model has 9 convolutions and one input/output layer. Can this be expected?
  3. Since I am working on a Jetson TX2, which supports TensorRT 5 at most, I don't have ICudaEngine::getBindingBytesPerComponent() or ICudaEngine::getBindingVectorizedDim() to characterize the neural network once it is loaded. Is there a way of some sort to understand what the layers and bindings of my neural network are once they're loaded in TensorRT 5?

TF-TRT optimizes and executes compatible subgraphs, allowing TensorFlow to execute the remaining graph. TensorRT will parse the model and apply optimizations to the portions of the graph wherever possible.

In order to generate a serialized standalone TRT engine following are the prerequites:

You can use visualization tool like tensorboard to understand, debug, and optimize model.

Please refer to below link for more debugging options:


Hello, and thank you for answering me.

We can see from the sample code in your first link that it creates individual plan files, in fact one file for each TensorRT optimized node. It also excludes non-optimized nodes from the plan files. Considering the rest of the documentation assumes we are working from a single plan file that contains a complete neural network, that is why I modified the sample code as shown in my original post.

Also, it is not clear to me what is meant by the first requirement about the entire model converting to TensorRT. How does one verify that? Does it mean that every node have to be optimized by TensorRT, or am I good as long as a valid TRT graph can be obtained from create_inference_graph()?


Yes, every node has to be optimized or supported by TensorRT to create serialized TRT engine. In case of TF-TRT, whichever node is not supported in TRT is handled by Tensorflow.

Another alternative is to convert your model to ONNX instead using tf2onnx and then convert to TensorRT using ONNX parser. Any layer that are not supported needs to be replaced by custom plugin.