Hi,
So I am trying to optimize my FaceNet network using TensorRT by the following code:
with tf.compat.v1.Session() as sess:
deserialize the frozen graph
with tf.io.gfile.GFile("./gdrive/My Drive/face_net.pb", “rb”) as f:
print(“done…1”)
frozen_graph = tf.compat.v1.GraphDef()
print(“done…2”)
frozen_graph.ParseFromString(f.read())
print(“done…3”)
trt_graph = trt.create_inference_graph (
input_graph_def = frozen_graph, outputs = [“embeddings”,"label_batch],
max_batch_size = 1, max_workspace_size_bytes = 1<<30,
precision_mode =“FP32”,
minimum_segment_size = 5)
#write the TensorRT model to be used later for inference
with gfile.FastGFile("./TensorRT_model.pb", ‘wb’) as f:
f.write(trt_graph.SerializeToString())
print(“TensorRT model is successfully stored!”)
I have the following questions:

The TensorRT model I created is actually heavier in size (180MB) than the actual .pb model (91MB). Is that common?

I have no idea how to use this new TRT model. Some help would be appreciated.

Should I even be using a .pb TensorRT model file (using the trt graph) or should I just use the trt graph?

Which brings me to my final question, for FaceNet how do I even use the trt graph in terms of performing recognition.
Please consider me as a complete beginner in the Deep Learning and NVIDIA domain.
Apologies for any inconvenience.