tensorRT converted graph doubled in size. why?

I am trying to convert my frozen_inference_graph.pb(ssd_inception_v2_coco) to tensorRT “FP16” model with two methods as shown below. actual graph is 61MB. with first method I get graph of same size 61 MB, while with second method, I am getting graph double in disk space 122MB.

os features:
ubuntu 16.0.4 LTS
nvidia driver version = Driver Version: 440.33.01
cuda version = 10
cuDNN version = 7.4
tensorflow = 1.14.1 (built from source with cuda and tensorRT support)
GPU = NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
Bazel version: 0.25

import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
from tensorflow.python.platform import gfile
from tensorflow.python.compiler.tensorrt import trt_convert as tr
from tensorflow.contrib import tensorrt as tr

outputs = [‘resnet_v1_50/predictions/Reshape_1’]

with tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.75))) as sess:
with gfile.FastGFile(’/home/ubuntu/resnetV150_frozen.pb’,‘rb’) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
graph_def = tf.graph_util.convert_variables_to_constants(sess, graph_def, outputs)
graph_def = tf.graph_util.remove_training_nodes(graph_def)

    with gfile.FastGFile('/home/ubuntu/new.pb', 'wb') as f:
        f.write(graph_def.SerializeToString())

    with gfile.FastGFile('/home/ubuntu/resnetV150_frozen.pb','rb') as f:
        graph_def1 = tf.GraphDef()
        graph_def1.ParseFromString(f.read())

    trt_graph = trt.create_inference_graph(input_graph_def=graph_def1,# frozen model
                                     outputs=outputs,
                                     max_batch_size=128,
                                     max_workspace_size_bytes=1<<30,
                                     precision_mode="FP16")

    with gfile.FastGFile("/home/ubuntu/new1.pb", 'wb') as f:
         f.write(trt_graph.SerializeToString())

print(“TensorRT model is successfully stored!”)

#########
NOTE:

also tried precision mode with “INT8” with following function.

trt_graph=tr.calib_graph_to_infer_graph(trt_graph)

but got an error saying no module named calib_graph_to_infer_graph.

Hi,

In above code only once FP16 model is generated, please let me know if i missed anything:

trt_graph = trt.create_inference_graph(input_graph_def=graph_def1,# frozen model
outputs=your_outputs,
max_batch_size=128,
max_workspace_size_bytes=1<<30,
precision_mode="FP16")

“new.pb” is just a simplified version of original model.

graph_def = tf.graph_util.convert_variables_to_constants(sess, graph_def, your_outputs)
graph_def = tf.graph_util.remove_training_nodes(graph_def)

with gfile.FastGFile('/home/ubuntu/test/new.pb', 'wb') as f:
f.write(graph_def.SerializeToString())

Generated model file doubled in size seems to be similar to https://github.com/tensorflow/tensorflow/issues/24789

Could you please share your model file so we can help better?

Thanks

Thank you for replying.
yes. new.pb is just a simplified version. but it also didn’t lead to any improvement.

and FP16 is used only once where you have mentioned.
yes, this is similar to https://github.com/tensorflow/tensorflow/issues/24789 where I got there was a bug but I didn’t get the answer of how to correct it or get the desired result like the optimized graph. like whether it was a problem with TensorFlow version I am using?

Below is where you can download a frozen graph.

https://drive.google.com/file/d/13SRSyyIG-XHEIJwGGmuM4XB9AHJrgHIY/view?usp=sharing

Hi,

Thanks for sharing the model, will look into it and update you.

Thank you.

Any update ?

It looks like you are using quite old library versions.
Can you try with updated packages (at least TensorFlow 1.15, CUDNN 7.6, CUDA 10.1) and let us know in case your are still facing the issue?

Thanks