Tensorflow tensorrt AddN batch support

Description

Converting graph using tftrt with batch>1 will not convert “AddN”

Environment

TensorRT Version:
7.1
GPU Type:
Jetson Xavier
Nvidia Driver Version:
Jetpack 4.2.4
CUDA Version:
10.2
CUDNN Version:
8.0.0
Operating System + Version:
Ubuntu 18.04
Python Version (if applicable):
Python 3.6
TensorFlow Version (if applicable):
tensorflow-2.1.0+nv20.4-cp36-cp36m-linux_aarch64.whl

Baremetal or Container (if container which image + tag):
Baremetal

Steps To Reproduce

Convert graph with batch 1: AddN is inside the engine:

tensorflow/compiler/tf2tensorrt/segment/[segment.cc:460](http://segment.cc:460/)] There are 32 ops of 16 different types in the graph that are not converted to TensorRT: Identity, GatherV2, Reshape, Size, Pack, NonMaxSuppressionV5, Tile, AddV2, ArgMax, Placeholder, NoOp, Cast, StridedSlice, ExpandDims, Mul, Squeeze,

Convert graph with batch >1: AddN is not converted and engine is split:

tensorflow/compiler/tf2tensorrt/segment/[segment.cc:460](http://segment.cc:460/)] There are 61 ops of 17 different types in the graph that are not converted to TensorRT: ConcatV2, Squeeze, Mul, ExpandDims, StridedSlice, Cast, Placeholder, Pack, NonMaxSuppressionV5, AddN, Tile, AddV2, ArgMax, NoOp, GatherV2, Size, Reshape,

What can cause this?

Subgraphs with fewer nodes than minimum_segment_size are not converted to TensorRT.
Can you try modifying the minimum_segment_size value?

Please refer to “minimum_segment_size” section in below link:
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#best-practices

If issue persist, could you please share the script and model files so we can help better?

Thanks

min_segment_size if very small and is not the issue here.

I added a reproducible code. An example usage for add_n not being converted with batching:

batch = 2
shape = (batch, 768, 960, 3)
const = tf.constant(4, dtype=tf.float32, shape=shape, name=‘Const’)
sess = _build_session()
with sess.graph.as_default():
sess.run(tf.initialize_all_variables())
image = np.random.uniform(low=0, high=255, size=shape)
input1 = tf.placeholder(tf.float32, shape=shape, name=‘input1’)
for i in range(5):
input1 = tf.add_n([input1, const])

ident = tf.identity(input1, name="output")

graph_deph = sess.graph_def
with tf.Graph().as_default():
    tf.import_graph_def(graph_deph, name='')
    with tf.Session(graph=tf.get_default_graph(), config=tf.ConfigProto()) as session:
        b = tf.saved_model.Builder(f"/tmp/d{str(time.time())}")
        signature_def_map = {
            'serving_default':
                tf.saved_model.predict_signature_def(
                    {"intput1": input1
                     },
                    {"output": ident}),
        }
        b.add_meta_graph_and_variables(
            session,
            tags=['serve'],
            signature_def_map=signature_def_map,
            assets_collection=tf.get_collection(tf.GraphKeys.ASSET_FILEPATHS),
            clear_devices=True)
        b.save()
        input_tensors = [
            tf.get_default_graph().get_tensor_by_name("input1:0"),
        ]
        output_tensors = [
            tf.get_default_graph().get_tensor_by_name("output:0"),
        ]
        converter = trt.TrtGraphConverter(
            nodes_blacklist=[i.name for i in input_tensors] + [i.name for i in output_tensors],
            max_workspace_size_bytes=1 << 30,
            input_graph_def=graph_deph,
            precision_mode="FP32",
            minimum_segment_size=0,
            maximum_cached_engines=100,
            max_batch_size=batch
        )
        infer_graph = converter.convert()

        out = session.run(output_tensors, feed_dict={
            input_tensors[0]: image,
            # input_tensors[1]: image
        })

This code produces the log:
2020-05-26 11:58:37.443945: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: AddN), (Op name: AddN_4), (Reason: Invalid argument: Weights input to AddN is required to have batch dimension 1.)

If the input is a constant, then it has to have batch size one.
For not constant input tensors, the batch size can be larger than one but TF requires that all args of add_n have same shape, so effectively if we have a const arg then we are limited to batch size 1 for all args.

A workaround is to feed the const arg as a placeholder to the network.

Thanks