I am trying to build a simple tensorflow model and run it on jetson nano using Tensor RT. I am using the following script in order to export the frozen graph and create tensor rt graph.
Convert Keras model to ConcreteFunction
full_model = tf.function(lambda x: model(x))
full_model = full_model.get_concrete_function(
x=tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype))
Get frozen ConcreteFunction
frozen_func = convert_variables_to_constants_v2(full_model)
frozen_func.graph.as_graph_def()
inspect the layers operations inside your frozen graph definition and see the name of its input and output tensors
layers = [op.name for op in frozen_func.graph.get_operations()]
print("-" * 50)
print("Frozen model layers: ")
for layer in layers:
print(layer)
print("-" * 50)
print("Frozen model inputs: ")
print(frozen_func.inputs)
print("Frozen model outputs: ")
print(frozen_func.outputs)
Save frozen graph from frozen ConcreteFunction to hard drive
serialize the frozen graph and its text representation to disk.
tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
logdir="./frozen_models",
name=“simple_frozen_graph.pb”,
as_text=False)
#Optional
tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
logdir="./frozen_models",
name=“simple_frozen_graph.pbtxt”,
as_text=True)
frozen_graph ="./frozen_models/simple_frozen_graph.pb"
input_names = [‘x’]
output_names = [‘Identity’]
import google.protobuf.text_format
with open(’./frozen_models/simple_frozen_graph.pbtxt’, ‘r’) as f:
frozen_graph_gd = google.protobuf.text_format.Parse(f.read(), tf.compat.v1.GraphDef())
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=10,
max_workspace_size_bytes=1 << 25,
precision_mode=‘FP16’,
)
I cannot see any time difference when executing the trt optimized model rather than the frozen_graph.
There are 5 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation).
2022-04-05 12:01:58.382227: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:647] Number of TensorRT candidate segments: 0
2022-04-05 12:01:58.385966: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: tf_graph
What concerns me is that Number of TensorRT candidate segments is 0. So this means that no optimization occurs?
Is it possible to achieve TRT optimization & acceleration considering that my network has these OPTs?