Hello,
I have a TensorFlow model saved as checkpoint files. My goal is to optimize the model in a TensorRT plan file, load it in my C++ program, and from there feed images to the network and obtain the outputs (masks).
With the help of the TF-TRT user guide, I created a frozen graph from my checkpoint files (user guide section 2.2.3). I then used create_inference_graph() with default parameters to obtain a TensorRT graph.
After that, I serialized the TRT graph to a plan file as indicated in section 2.10. One problem I notice there is that using the code from section 2.10 “as is” creates individual plan files for every optimized layer of my network. Also, non-optimized nodes are ignored. Therefore I modified the loop from section 2.10 to include all nodes and produce a single plan. This is my modification:
with tf.gfile.GFile("my_network.plan" % (n.name.replace("/", "_")), 'wb') as f:
for n in trt_graph.node:
print("Node: %s, %s" % (n.op, n.name.replace("/", "_")))
f.write(n.attr["serialized_segment"].s)
Once this is done, I now move to my C++ program where I can deserialize the plan:
IRuntime* runtime = createInferRuntime(gLogger);
ICudaEngine* engine = runtime->deserializeCudaEngine(modelData, modelSize, nullptr);
I have 3 questions:
- Can I put all nodes of my neural network into a single plan file, as shown in the first code snippet? Or should the plan file contain only optimized operations?
- Why does the ICudaEngine (obtained in the 2nd code snippet) only shows 2 layers with getNbLayers()? I understand layers can be fused during optimization, but my original model has 9 convolutions and one input/output layer. Can this be expected?
- Since I am working on a Jetson TX2, which supports TensorRT 5 at most, I don't have ICudaEngine::getBindingBytesPerComponent() or ICudaEngine::getBindingVectorizedDim() to characterize the neural network once it is loaded. Is there a way of some sort to understand what the layers and bindings of my neural network are once they're loaded in TensorRT 5?