TensorRT help?


So I am trying to optimize my FaceNet network using TensorRT by the following code:

with tf.compat.v1.Session() as sess:

deserialize the frozen graph

with tf.io.gfile.GFile("./gdrive/My Drive/face_net.pb", “rb”) as f:
frozen_graph = tf.compat.v1.GraphDef()

trt_graph = trt.create_inference_graph (
input_graph_def = frozen_graph, outputs = [“embeddings”,"label_batch],
max_batch_size = 1, max_workspace_size_bytes = 1<<30,
precision_mode =“FP32”,
minimum_segment_size = 5)

#write the TensorRT model to be used later for inference
with gfile.FastGFile("./TensorRT_model.pb", ‘wb’) as f:
print(“TensorRT model is successfully stored!”)

I have the following questions:

  1. The TensorRT model I created is actually heavier in size (180MB) than the actual .pb model (91MB). Is that common?

  2. I have no idea how to use this new TRT model. Some help would be appreciated.

  3. Should I even be using a .pb TensorRT model file (using the trt graph) or should I just use the trt graph?

  4. Which brings me to my final question, for FaceNet how do I even use the trt graph in terms of performing recognition.

Please consider me as a complete beginner in the Deep Learning and NVIDIA domain.

Apologies for any inconvenience.


1. Yes, since TensorRT will serialize all the required data for fast loading.
(You don’t need to re-parse the model next time)

2. This is a TF-TRT model. So the usage is similar to run a TensorFlow model.

3. TensorRT graph should be enough.

4. In general, we recommends to use pure TensorRT outside of the TensorFlow to get an optimized performance on Nano.
The workflow looks like this: pb->onnx->plan.

You can find some information in the below document:


Thank you so much.