Optimize/Prune the Graph to prevent UFF conversion failure

Description

Getting TensorRT conversion failure due to ops not supported in TensorRT. Till now I have come to conclusion to optimize/prune the original tensorflow flow graph so that freeze graph should have only supported ops as per TensorRT and that should then convert easily to uff.

To do this, i want to remove training nodes which i think are unsupported nodes in uff conversion using TransformGraph. Option i used are below, other then Input shape change and “Identity” node no other node removed like : ‘switch’, ‘exit’, ‘add’.

  1. what options to choose in TransformGraph() to remove ‘switch’, ‘exit’, ‘add’ nodes?
  2. We use ‘from tensorflow.python.tools import freeze_graph’ to freeze our graph, is there any direct way to generate optimized graph from saved model format. I see that OpenVino (Intel GPU toolkit) has optimize_for_inferencing() method, can we similar function in Nvidia?*
  3. How to remove training nodes from freezed graph?
  4. How to remove prefix “layer6” from all nodes (ex: layer 6 from “layer6/Conv2D” “layer6/final_dense”?
  5. any direct TF API to directly prune the graph?

import tensorflow as tf
from tensorflow.tools.graph_transforms import TransformGraph

def optimize_graph(model_dir, graph_filename, transforms, output_node):
with tf.compat.v1.Session() as sess:

    # shape=[1, ?, ?, 3] -> shape=[1, 128, 128, 3]
    # name='image' specifies the placeholder name of the converted model
    inputs = tf.compat.v1.placeholder(tf.uint8, shape=[1, 128, 128, 3], name='input_image_tensor')
    with tf.io.gfile.GFile(graph_filename, 'rb') as f:
        graph_def = tf.compat.v1.GraphDef()
    graph_def.ParseFromString(f.read())

    # 'image:0' specifies the placeholder name of the model before conversion
    tf.graph_util.import_graph_def(graph_def, input_map={'image_tensor:0': inputs}, name='')
    tf.graph_util.remove_training_nodes(graph_def, protected_nodes=None )
    print([n for n in tf.compat.v1.get_default_graph().as_graph_def().node if n.name == 'input_image_tensor'])

    optimized_graph_def = TransformGraph(
                              tf.compat.v1.get_default_graph().as_graph_def(),
                              'input_image_tensor',
                              [output_node],
                              transforms)
    describe_graph(optimized_graph_def)
    tf.io.write_graph(optimized_graph_def, model_dir, 'optimized_model.pb', as_text=False)

transforms = [
‘remove_nodes(op=Identity, op=CheckNumerics, op=TensorArray)’,

‘merge_duplicate_nodes’,

‘strip_unused_nodes(type=uint8, shape=“1,128,128,3”)’,
‘fold_constants(ignore_errors=true)’,
‘fold_batch_norms’,
‘fold_old_batch_norms’,

‘obfuscate_names’,

‘quantize_weights’,

‘quantize_nodes’,

‘strip_unused_nodes’,

‘sort_by_execution_order’

]

optimize_graph("./",“freeze_graph.pb”,transforms, “layer6/final_dense”)

==============================================================================
def freeze_model(saved_model_dir, output_node_names, output_filename):
output_graph_filename = os.path.join(saved_model_dir, output_filename)
initializer_nodes = ‘’
freeze_graph.freeze_graph(
input_saved_model_dir=saved_model_dir,
output_graph=output_graph_filename,
saved_model_tags = tag_constants.SERVING,
output_node_names=output_node_names,
initializer_nodes=initializer_nodes,
input_graph=None,
input_saver=False,
input_binary=False,
input_checkpoint=None,
restore_op_name=None,
filename_tensor_name=None,
clear_devices=False,
input_meta_graph=False,
)
print(‘graph freezed!’)

Environment

TensorRT Version : 7.0.0.11
GPU Type : Tesla K80
Nvidia Driver Version : 440.64.00
CUDA Version : CUDA Version: 10.2
CUDNN Version : 7.6
Operating System + Version : Ubuntu x86_64
Python Version (if applicable) : 3.6
TensorFlow Version (if applicable) : 1.15.2
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) : barematel

Relevant Files


Steps To Reproduce

~/notebook/head-pose-estimation$ convert-to-uff frozen_graph.pb
Warning: No conversion function registered for layer: TensorArrayGatherV3 yet.
Converting map/TensorArrayStack/TensorArrayGatherV3 as custom op: TensorArrayGatherV3
W0529 23:11:49.407045 139740564793152 module_wrapper.py:139] From /home/ubuntu/anaconda3/lib/python3.6/site-packages/uff/converters/tensorflow/converter.py:179: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

Warning: No conversion function registered for layer: Exit yet.
Converting map/while/Exit_2 as custom op: Exit
Warning: No conversion function registered for layer: Switch yet.
Converting map/while/Switch_2 as custom op: Switch
Warning: No conversion function registered for layer: LoopCond yet.
Converting map/while/LoopCond as custom op: LoopCond
Warning: No conversion function registered for layer: LogicalAnd yet.
Converting map/while/LogicalAnd as custom op: LogicalAnd
Warning: No conversion function registered for layer: Less yet.
Converting map/while/Less_1 as custom op: Less
Warning: No conversion function registered for layer: Enter yet.
Converting map/while/Less/Enter as custom op: Enter
Warning: No conversion function registered for layer: Merge yet.
Converting map/while/Merge_1 as custom op: Merge
Warning: No conversion function registered for layer: NextIteration yet.
Converting map/while/NextIteration_1 as custom op: NextIteration
Warning: No conversion function registered for layer: Switch yet.
Converting map/while/Switch as custom op: Switch
Warning: No conversion function registered for layer: Merge yet.
Converting map/while/Merge as custom op: Merge
Warning: No conversion function registered for layer: NextIteration yet.
Converting map/while/NextIteration as custom op: NextIteration
Warning: No conversion function registered for layer: Enter yet.
Converting map/while/Enter as custom op: Enter
Warning: No conversion function registered for layer: Switch yet.
Converting map/while/Switch_1 as custom op: Switch
Warning: No conversion function registered for layer: Enter yet.
Converting map/while/Enter_1 as custom op: Enter
Warning: No conversion function registered for layer: Less yet.
Converting map/while/Less as custom op: Less
Warning: No conversion function registered for layer: Merge yet.
Converting map/while/Merge_2 as custom op: Merge
Warning: No conversion function registered for layer: NextIteration yet.
Converting map/while/NextIteration_2 as custom op: NextIteration
Warning: No conversion function registered for layer: TensorArrayWriteV3 yet.
Converting map/while/TensorArrayWrite/TensorArrayWriteV3 as custom op: TensorArrayWriteV3
Warning: No conversion function registered for layer: ResizeBilinear yet.
Converting map/while/resize/ResizeBilinear as custom op: ResizeBilinear
Warning: No conversion function registered for layer: TensorArrayReadV3 yet.
Converting map/while/TensorArrayReadV3 as custom op: TensorArrayReadV3
Warning: No conversion function registered for layer: Enter yet.
Converting map/while/TensorArrayReadV3/Enter_1 as custom op: Enter
Warning: No conversion function registered for layer: TensorArrayScatterV3 yet.
Converting map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3 as custom op: TensorArrayScatterV3
Warning: No conversion function registered for layer: TensorArrayV3 yet.
Converting map/TensorArray as custom op: TensorArrayV3
Warning: No conversion function registered for layer: Range yet.
Converting map/TensorArrayUnstack/range as custom op: Range
Warning: No conversion function registered for layer: Enter yet.
Converting map/while/TensorArrayReadV3/Enter as custom op: Enter

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

You can use graphsurgeon, please refer below link:

You can try TF-TRT to optimize the node.
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html

Please refer below API in case it helps
https://www.tensorflow.org/api_docs/python

Not sure why it is required during TRT conversion, could you please elaborate.

Found below link which might be useful, for more details/issues you can raise issue in tensorflow git repo:
https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide

Thanks

@SunilJB I wanted to remove training “map/" nodes (control ops) from the frozen_grap.pb and replace it by Reshape/Shape and Cast (unit8 to float32). As you said in another post to not to use convert-to-uff as it is deprecated and use onnx (which fails at engine load), graphsuergon is only used in uff conversion, so it is not valid solution anymore. Further as i tried tf2onnx.convert but engine load failed. i wanted to remove training "map/” nodes (20+ nodes as you can see) form original TF frozen_graph.pb as they are not required at inference time. As you suggested to use TF API’s I used “optimize_for_inference”, “remove_training_nodes” and “Transform Graph”. but training “map/*” nodes are not removed. But still training nodes are not removed.

============================================================================
import tensorflow as tf
from tensorflow.tools.graph_transforms import TransformGraph
from tensorflow.python.tools.optimize_for_inference_lib import optimize_for_inference

def optimize_graph(model_dir, graph_filename, transforms, output_node):
input_node = ‘image_tensor’
with tf.compat.v1.Session() as sess:

    # shape=[1, ?, ?, 3] -> shape=[1, 128, 128, 3]
    # name='image' specifies the placeholder name of the converted model
    inputs = tf.compat.v1.placeholder(tf.uint8, shape=[1, 128, 128, 3], name=input_node)
    with tf.io.gfile.GFile(graph_filename, 'rb') as f:
        graph_def = tf.compat.v1.GraphDef()
    graph_def.ParseFromString(f.read())

    input_nodes = [inputs]
    graph_def = optimize_for_inference(
        input_graph_def=graph_def,
        input_node_names=[_get_node_name(t) for t in input_nodes],
        output_node_names=[output_node],
        placeholder_type_enum=[node.dtype.as_datatype_enum for node in input_nodes]
    )
        
    print("Before:")
    print([n.name for n in graph_def.node])
    
    # 'image:0' specifies the placeholder name of the model before conversion
    tf.graph_util.import_graph_def(graph_def, input_map={'image_tensor:0': inputs}, name='')
    graph_def = tf.graph_util.remove_training_nodes(graph_def, protected_nodes=None)
    print([n for n in tf.compat.v1.get_default_graph().as_graph_def().node if n.name == input_node])

    optimized_graph_def = TransformGraph(
                              graph_def,
                              [input_node],
                              [output_node],
                              transforms)
    
    describe_graph(graph_def)
    describe_graph(optimized_graph_def)
    tf.io.write_graph(optimized_graph_def, model_dir, 'optimized_model.pb', as_text=False)

transforms = [
‘remove_nodes(op=Identity, op=CheckNumerics)’,

‘merge_duplicate_nodes’,

‘rename_attribute(old_attribute_name=“layer6/logits/BiasAdd”,new_attribute_name=“logits/BiasAdd”)’,
‘strip_unused_nodes(type=uint8, shape=“1,128,128,3”)’,
‘fold_constants(ignore_errors=true)’,
‘fold_batch_norms’,
‘fold_old_batch_norms’,

‘obfuscate_names’,

‘quantize_weights’,

‘quantize_nodes’,

‘strip_unused_nodes’,

‘sort_by_execution_order’

]

optimize_graph("/home/ubuntu/notebook/head-pose-estimation/",“frozen_graph.pb”,transforms, “layer6/logits/BiasAdd”)

Output:

After:
[‘map/Shape’, ‘map/strided_slice/stack’, ‘map/strided_slice/stack_1’, ‘map/strided_slice/stack_2’, ‘map/strided_slice’, ‘map/TensorArray’, ‘map/TensorArrayUnstack/Shape’, ‘map/TensorArrayUnstack/strided_slice/stack’, ‘map/TensorArrayUnstack/strided_slice/stack_1’, ‘map/TensorArrayUnstack/strided_slice/stack_2’, ‘map/TensorArrayUnstack/strided_slice’, ‘map/TensorArrayUnstack/range/start’, ‘map/TensorArrayUnstack/range/delta’, ‘map/TensorArrayUnstack/range’, ‘map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3’, ‘map/Const’, ‘map/TensorArray_1’, ‘map/while/iteration_counter’, ‘map/while/Enter’, ‘map/while/Enter_1’, ‘map/while/Enter_2’, ‘map/while/Merge’, ‘map/while/Merge_1’, ‘map/while/Merge_2’, ‘map/while/Less/Enter’, ‘map/while/Less’, ‘map/while/Less_1’, ‘map/while/LogicalAnd’, ‘map/while/LoopCond’, ‘map/while/Switch_1’, ‘map/while/Switch_2’, ‘map/while/Switch’, ‘map/while/add/y’, ‘map/while/add’, ‘map/while/TensorArrayReadV3/Enter’, ‘map/while/TensorArrayReadV3/Enter_1’, ‘map/while/TensorArrayReadV3’, ‘map/while/resize/ExpandDims/dim’, ‘map/while/resize/ExpandDims’, ‘map/while/resize/size’, ‘map/while/resize/ResizeBilinear’, ‘map/while/resize/Squeeze’, ‘map/while/TensorArrayWrite/TensorArrayWriteV3/Enter’, ‘map/while/TensorArrayWrite/TensorArrayWriteV3’, ‘map/while/add_1/y’, ‘map/while/add_1’, ‘map/while/NextIteration’, ‘map/while/NextIteration_1’, ‘map/while/NextIteration_2’, ‘map/while/Exit_2’, ‘map/TensorArrayStack/TensorArraySizeV3’, ‘map/TensorArrayStack/range/start’, ‘map/TensorArrayStack/range/delta’, ‘map/TensorArrayStack/range’, ‘map/TensorArrayStack/TensorArrayGatherV3’, ‘layer1/conv2d/kernel’, ‘layer1/conv2d/bias’, ‘layer1/conv2d/Conv2D’, ‘layer1/conv2d/BiasAdd’, ‘layer1/conv2d/Relu’, ‘layer1/max_pooling2d/MaxPool’, ‘layer2/conv2d/kernel’, ‘layer2/conv2d/bias’, ‘layer2/conv2d/Conv2D’, ‘layer2/conv2d/BiasAdd’, ‘layer2/conv2d/Relu’, ‘layer2/conv2d_1/kernel’, ‘layer2/conv2d_1/bias’, ‘layer2/conv2d_1/Conv2D’, ‘layer2/conv2d_1/BiasAdd’, ‘layer2/conv2d_1/Relu’, ‘layer2/max_pooling2d/MaxPool’, ‘layer3/conv2d/kernel’, ‘layer3/conv2d/bias’, ‘layer3/conv2d/Conv2D’, ‘layer3/conv2d/BiasAdd’, ‘layer3/conv2d/Relu’, ‘layer3/conv2d_1/kernel’, ‘layer3/conv2d_1/bias’, ‘layer3/conv2d_1/Conv2D’, ‘layer3/conv2d_1/BiasAdd’, ‘layer3/conv2d_1/Relu’, ‘layer3/max_pooling2d/MaxPool’, ‘layer4/conv2d/kernel’, ‘layer4/conv2d/bias’, ‘layer4/conv2d/Conv2D’, ‘layer4/conv2d/BiasAdd’, ‘layer4/conv2d/Relu’, ‘layer4/conv2d_1/kernel’, ‘layer4/conv2d_1/bias’, ‘layer4/conv2d_1/Conv2D’, ‘layer4/conv2d_1/BiasAdd’, ‘layer4/conv2d_1/Relu’, ‘layer4/max_pooling2d/MaxPool’, ‘layer5/conv2d/kernel’, ‘layer5/conv2d/bias’, ‘layer5/conv2d/Conv2D’, ‘layer5/conv2d/BiasAdd’, ‘layer5/conv2d/Relu’, ‘layer6/flatten/Shape’, ‘layer6/flatten/strided_slice/stack’, ‘layer6/flatten/strided_slice/stack_1’, ‘layer6/flatten/strided_slice/stack_2’, ‘layer6/flatten/strided_slice’, ‘layer6/flatten/Reshape/shape/1’, ‘layer6/flatten/Reshape/shape’, ‘layer6/flatten/Reshape’, ‘layer6/dense/kernel’, ‘layer6/dense/bias’, ‘layer6/dense/MatMul’, ‘layer6/dense/BiasAdd’, ‘layer6/dense/Relu’, ‘layer6/logits/kernel’, ‘layer6/logits/bias’, ‘layer6/logits/MatMul’, ‘layer6/logits/BiasAdd’, ‘image_tensor’]
Input Feature Nodes: [‘image_tensor’]

Unused Nodes:

Output Nodes:

Quantization Nodes:

Constant Count: 40

Variable Count: 0

Identity Count: 1
Total nodes: 118
Input Feature Nodes: [‘image_tensor’]

Unused Nodes:

Output Nodes:

Quantization Nodes:

Constant Count: 40

Variable Count: 0

Identity Count: 0
Total nodes: 117

=========================================================================

Please suggest if any step missing in my code for removing training “map/” nodes?

@SunilJB Here is the link to Juypter notebook.

TransformGraph notebook:

I want to remove training nodes (map/*) using TransformGraph. Option i used are below, other then Input shape change and “Identity” node no other node removed like : ‘switch’, ‘exit’, ‘add’.

  1. what options to choose in TransformGraph() to remove ‘switch’, ‘exit’, ‘add’ nodes?
  2. Used Optimize_for_inference() and remvoe_traing_nodes() without success.

Saved_model:

Freeze_graph.pb:

This issue doesn’t seems to be related to TensorRT.
Will request you to raise issue under Tensorflow github issues section:

Thanks