Hey Guys,
I don’t see any performance improvements while using TF-TRT optimized graph for inference. I’ve tried both ResNet50 and DenseNet121. It takes 161s to run inference on validation dataset when using unoptimized TF frozen graph and 160 seconds to run inference when using optimized TFTRT frozen graph. I’m using the TensorFlow NGC container. The converted graphs are FP16. I’m not sure if I’m doing the TFTRT conversion incorrectly.
System Information:
Container Image: nvcr.io/nvidia/tensorflow:19.10-py3
OS Platform and Distribution: Ubuntu 18.04
TensorFlow Version: tensorflow-gpu 1.14.0+nv
Python Version: 3.6.8
CUDA/cuDNN version: 10.1 / 7.6.4
GPU Model and Memory: T4 / 16 GB
Code to convert and save the optimized TRT model:
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
def get_frozen_graph(graph_file):
"""Read Frozen Graph file from disk."""
with tf.gfile.FastGFile(graph_file, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return graph_def
print("Load frozen graph from disk")
frozen_graph = get_frozen_graph("/workspace/saved_models/resnet_trained.pb")
#output_names = model.output.op.name
for node in frozen_graph.node:
final_node_name = node.name
print("Optimize the model with TensorRT")
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=[final_node_name],
max_batch_size=128,
is_dynamic_op=True,
precision_mode='FP16',
minimum_segment_size=3
)
print("Write optimized model to the file")
with open("/workspace/saved_models/resnet_fp16_trt_test.pb", 'wb') as f:
f.write(trt_graph.SerializeToString())
I’ve also tried to use TrtGraphConverter instead of create_inference_graph, but it just makes the inference time worse.
converter = trt.TrtGraphConverter(
input_graph_def=frozen_graph,
nodes_blacklist=[final_node_name],#output Nodes
max_batch_size=128,
is_dynamic_op=True,
precision_mode="FP16") #use dynamic mode if the graph as undefined shapes
trt_graph = converter.convert()
Part of log while converting the graph:
2019-11-08 19:53:57.289698: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 555 nodes succeeded.
2019-11-08 19:53:57.363554: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:752] Optimization results for grappler item: tf_graph
2019-11-08 19:53:57.363608: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 555 nodes (-320), 570 edges (-320), time = 400.515ms.
2019-11-08 19:53:57.363618: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] layout: Graph size after: 560 nodes (5), 572 edges (2), time = 101.55ms.
2019-11-08 19:53:57.363626: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 557 nodes (-3), 572 edges (0), time = 299.556ms.
2019-11-08 19:53:57.363635: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] TensorRTOptimizer: Graph size after: 3 nodes (-554), 2 edges (-570), time = 520.169ms.
Part of log while running running inference:
No. of Nodes are 3
prefix/input_1
prefix/TRTEngineOp_0
prefix/dense/Sigmoid
predicting using your model:…
Thanks