No performance improvement with TF-TRT optimization (ResNet50, DenseNet121)

Hey Guys,

I don’t see any performance improvements while using TF-TRT optimized graph for inference. I’ve tried both ResNet50 and DenseNet121. It takes 161s to run inference on validation dataset when using unoptimized TF frozen graph and 160 seconds to run inference when using optimized TFTRT frozen graph. I’m using the TensorFlow NGC container. The converted graphs are FP16. I’m not sure if I’m doing the TFTRT conversion incorrectly.

System Information:

Container Image: nvcr.io/nvidia/tensorflow:19.10-py3

OS Platform and Distribution: Ubuntu 18.04
TensorFlow Version: tensorflow-gpu 1.14.0+nv
Python Version: 3.6.8
CUDA/cuDNN version: 10.1 / 7.6.4
GPU Model and Memory: T4 / 16 GB

Code to convert and save the optimized TRT model:

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt

def get_frozen_graph(graph_file):
    """Read Frozen Graph file from disk."""
    with tf.gfile.FastGFile(graph_file, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    return graph_def


print("Load frozen graph from disk")

frozen_graph = get_frozen_graph("/workspace/saved_models/resnet_trained.pb")

#output_names = model.output.op.name

for node in frozen_graph.node:
    final_node_name = node.name

print("Optimize the model with TensorRT")

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=[final_node_name],
    max_batch_size=128,
    is_dynamic_op=True,
    precision_mode='FP16',
    minimum_segment_size=3
)

print("Write optimized model to the file")
with open("/workspace/saved_models/resnet_fp16_trt_test.pb", 'wb') as f:
    f.write(trt_graph.SerializeToString())

I’ve also tried to use TrtGraphConverter instead of create_inference_graph, but it just makes the inference time worse.

converter = trt.TrtGraphConverter(
    input_graph_def=frozen_graph,
    nodes_blacklist=[final_node_name],#output Nodes
    max_batch_size=128,
    is_dynamic_op=True,
    precision_mode="FP16") #use dynamic mode if the graph as undefined shapes
trt_graph = converter.convert()

Part of log while converting the graph:

2019-11-08 19:53:57.289698: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 555 nodes succeeded.
2019-11-08 19:53:57.363554: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:752] Optimization results for grappler item: tf_graph
2019-11-08 19:53:57.363608: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 555 nodes (-320), 570 edges (-320), time = 400.515ms.
2019-11-08 19:53:57.363618: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] layout: Graph size after: 560 nodes (5), 572 edges (2), time = 101.55ms.
2019-11-08 19:53:57.363626: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 557 nodes (-3), 572 edges (0), time = 299.556ms.
2019-11-08 19:53:57.363635: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] TensorRTOptimizer: Graph size after: 3 nodes (-554), 2 edges (-570), time = 520.169ms.

Part of log while running running inference:


No. of Nodes are 3


prefix/input_1
prefix/TRTEngineOp_0
prefix/dense/Sigmoid
predicting using your model:…

Thanks

Did anyone get a chance to look at this?

Could you please let us know if you are still facing this issue?

Thanks

1 Like

Hey Sunil,

yes, it’s still the same.
Any insights or if there are any changes to documentation to follow since its been over 8 months, it would be great.

Can you try using TF 2.x?
Please refer to below sample for your reference:

Thanks