I am trying to convert my frozen_inference_graph.pb(ssd_inception_v2_coco) to tensorRT “FP16” model with two methods as shown below. actual graph is 61MB. with first method I get graph of same size 61 MB, while with second method, I am getting graph double in disk space 122MB.

os features:

ubuntu 16.0.4 LTS

nvidia driver version = Driver Version: 440.33.01

cuda version = 10

cuDNN version = 7.4

tensorflow = 1.14.1 (built from source with cuda and tensorRT support)

GPU = NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]

Bazel version: 0.25

import tensorflow as tf

import tensorflow.contrib.tensorrt as trt

from tensorflow.python.platform import gfile

from tensorflow.python.compiler.tensorrt import trt_convert as tr

from tensorflow.contrib import tensorrt as tr

outputs = [‘resnet_v1_50/predictions/Reshape_1’]

with tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.75))) as sess:

with gfile.FastGFile(’/home/ubuntu/resnetV150_frozen.pb’,‘rb’) as f:

graph_def = tf.GraphDef()

graph_def.ParseFromString(f.read())

graph_def = tf.graph_util.convert_variables_to_constants(sess, graph_def, outputs)

graph_def = tf.graph_util.remove_training_nodes(graph_def)

```
with gfile.FastGFile('/home/ubuntu/new.pb', 'wb') as f:
f.write(graph_def.SerializeToString())
with gfile.FastGFile('/home/ubuntu/resnetV150_frozen.pb','rb') as f:
graph_def1 = tf.GraphDef()
graph_def1.ParseFromString(f.read())
trt_graph = trt.create_inference_graph(input_graph_def=graph_def1,# frozen model
outputs=outputs,
max_batch_size=128,
max_workspace_size_bytes=1<<30,
precision_mode="FP16")
with gfile.FastGFile("/home/ubuntu/new1.pb", 'wb') as f:
f.write(trt_graph.SerializeToString())
```

print(“TensorRT model is successfully stored!”)

#########

NOTE:

also tried precision mode with “INT8” with following function.

trt_graph=tr.calib_graph_to_infer_graph(trt_graph)

but got an error saying no module named calib_graph_to_infer_graph.