No performance improvement with TF-TRT optimization (ResNet50, DenseNet121)

Hey Guys,

I don’t see any performance improvements while using TF-TRT optimized graph for inference. I’ve tried both ResNet50 and DenseNet121. It takes 161s to run inference on validation dataset when using unoptimized TF frozen graph and 160 seconds to run inference when using optimized TFTRT frozen graph. I’m using the TensorFlow NGC container. The converted graphs are FP16. I’m not sure if I’m doing the TFTRT conversion incorrectly.

System Information:

Container Image:

OS Platform and Distribution: Ubuntu 18.04
TensorFlow Version: tensorflow-gpu 1.14.0+nv
Python Version: 3.6.8
CUDA/cuDNN version: 10.1 / 7.6.4
GPU Model and Memory: T4 / 16 GB

Code to convert and save the optimized TRT model:

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt

def get_frozen_graph(graph_file):
    """Read Frozen Graph file from disk."""
    with tf.gfile.FastGFile(graph_file, "rb") as f:
        graph_def = tf.GraphDef()
    return graph_def

print("Load frozen graph from disk")

frozen_graph = get_frozen_graph("/workspace/saved_models/resnet_trained.pb")

#output_names =

for node in frozen_graph.node:
    final_node_name =

print("Optimize the model with TensorRT")

trt_graph = trt.create_inference_graph(

print("Write optimized model to the file")
with open("/workspace/saved_models/resnet_fp16_trt_test.pb", 'wb') as f:

I’ve also tried to use TrtGraphConverter instead of create_inference_graph, but it just makes the inference time worse.

converter = trt.TrtGraphConverter(
    nodes_blacklist=[final_node_name],#output Nodes
    precision_mode="FP16") #use dynamic mode if the graph as undefined shapes
trt_graph = converter.convert()

Part of log while converting the graph:

2019-11-08 19:53:57.289698: I tensorflow/compiler/tf2tensorrt/convert/] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 555 nodes succeeded.
2019-11-08 19:53:57.363554: I tensorflow/core/grappler/optimizers/] Optimization results for grappler item: tf_graph
2019-11-08 19:53:57.363608: I tensorflow/core/grappler/optimizers/] constant folding: Graph size after: 555 nodes (-320), 570 edges (-320), time = 400.515ms.
2019-11-08 19:53:57.363618: I tensorflow/core/grappler/optimizers/] layout: Graph size after: 560 nodes (5), 572 edges (2), time = 101.55ms.
2019-11-08 19:53:57.363626: I tensorflow/core/grappler/optimizers/] constant folding: Graph size after: 557 nodes (-3), 572 edges (0), time = 299.556ms.
2019-11-08 19:53:57.363635: I tensorflow/core/grappler/optimizers/] TensorRTOptimizer: Graph size after: 3 nodes (-554), 2 edges (-570), time = 520.169ms.

Part of log while running running inference:

No. of Nodes are 3

predicting using your model:…


Did anyone get a chance to look at this?

Could you please let us know if you are still facing this issue?


1 Like

Hey Sunil,

yes, it’s still the same.
Any insights or if there are any changes to documentation to follow since its been over 8 months, it would be great.

Can you try using TF 2.x?
Please refer to below sample for your reference: