Tensorflow Deeplab to TensorRT conversion

Converting Deeplab Tensorflow model to TensorRT model increases inference time dramatically, what am I doing wrong in my code?

Here I am doing the conversion from Tensorflow graph to TensorRT graph and saving this new TRT model:

OUTPUT_NAME = ["SemanticPredictions"]

# read Tensorflow frozen graph
with gfile.FastGFile('frozen_inference_graph.pb', 'rb') as tf_model:
   tf_graphf = tensorflow.GraphDef()

# convert (optimize) frozen model to TensorRT model
trt_graph = trt.create_inference_graph(input_graph_def=tf_graphf, outputs=OUTPUT_NAME,
max_batch_size=2, max_workspace_size_bytes=2 * (10 ** 9), precision_mode="INT8")  

# write the TensorRT model to be used later for inference
with gfile.FastGFile("TensorRT_model.pb", 'wb') as f:
print("TensorRT model is successfully stored!")

And in another script, I am loading this TRT model again and make semantic segmentation prediction with it but it is about 7 to 8 times slower! Here goes the second script:

with tensorflow.Session(config=tensorflow.ConfigProto(gpu_options=tensorflow.GPUOptions(per_process_gpu_memory_fraction=0.50))) as sess:
   img_array = cv2.imread('test.png',1)

   # read TensorRT frozen graph
   with gfile.FastGFile('TensorRT_model.pb', 'rb') as trt_model:
      trt_graph = tensorflow.GraphDef()

   # obtain the corresponding input-output tensor
   tensorflow.import_graph_def(trt_graph, name='')
   input = sess.graph.get_tensor_by_name('ImageTensor:0')
   output = sess.graph.get_tensor_by_name('SemanticPredictions:0')

   # perform inference
   batch_seg_map = sess.run(output, feed_dict={input: [img_array]})
   seg_map = batch_seg_map[0]
   seg_img = label_to_color_image(seg_map).astype(np.uint8)

Any ideas how should I perform the conversion properly in a way that speeds up the inference?

Hello, can you provide details on the platforms you are using?

Linux distro and version
GPU type
nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version

Also, to help us debug, can you share a small repro containing the full conversion and inference code, the .pb file, and dataset that demonstrate the performance issue you are seeing?

I am using:
Ubuntu 16.04
GPU: Nvidia 1050ti
Nvidia driver version: 384.130
Cuda: 9.0
Cudnn: 7
Python: 2.7
Tensroflow version: 1.13.0rc
TensorRT version:

The conversion code is almost same as I have shared here and this is the model I am using: http://download.tensorflow.org/models/deeplabv3_mnv2_cityscapes_train_2018_02_05.tar.gz

The test image could be sth like this: https://i.ibb.co/rwRZMvq/rsz-louvl.png

in the previous comment, I tried to answer your questions but I forgot to quote your comment :)