Implementing Fill layer as custom plugin

Hello,

InceptionV3, DenseNet and InceptionResNetV2 from keras applications use layer.concatenate in their design. Since this layer uses tf.ones_like beneath, the UFF converter throws errors stating Fill layer is not supported.

I decided to implement this layer using the plugin API. My base code is uff_custom_plugin from TensorRT samples (https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#uff_custom_plugin).

Before using the graph surgeon, the layers not being supported look like this:

.
.
.
, 'batch_normalization_4/ones_like/Shape': name: "batch_normalization_4/ones_like/Shape"
op: "Const"
attr { 
  key: "dtype"
  value { 
    type: DT_INT32
  }
}
attr {
  key: "value"
  value {
    tensor {
      dtype: DT_INT32
      tensor_shape {
        dim {
          size: 1
        }
      }
      int_val: 80
    }
  }
}

, 'batch_normalization_4/ones_like/Const': name: "batch_normalization_4/ones_like/Const"
op: "Const"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "value"
  value {
    tensor {
      dtype: DT_FLOAT
      tensor_shape {
      }
      float_val: 1.0
    }
  }
}
, 'batch_normalization_4/ones_like': name: "batch_normalization_4/ones_like"
op: "Fill"
input: "batch_normalization_4/ones_like/Shape"
input: "batch_normalization_4/ones_like/Const"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "index_type"
  value {
    type: DT_INT32
  }
}
.
.
.

This is expected since the tf.fill operation has 2 input arguments: shape and value (https://www.tensorflow.org/api_docs/python/tf/fill)

So using graph surgeon API, I search for the nodes with Fill operation and collapse their namespace while replacing them with the CustomClipPlugin:

, 'batch_normalization_4/ones_like': name: "batch_normalization_4/ones_like"
op: "CustomClipPlugin"
attr {
  key: "dims_u_int"
  value {
    i: 80
  }
}
attr {
  key: "value_u_float"
  value {
    f: 1.0
  }
}

Although I haven’t implemented the actual CUDA kernel, I wanted to make sure things work at high level when only converting to UFF and building the engine without inference. However, I get the following error (I am also printing each function call in customClipPlugin.cpp):

ClipPluginCreator::getPluginName
	ClipPluginCreator::getFieldNames
	ClipPluginCreator::createPlugin
	ClipPlugin::ClipPlugin
	ClipPlugin::setPluginNamespace
	ClipPlugin::getNbOutputs
	ClipPlugin::clone
	ClipPlugin::ClipPlugin
	ClipPlugin::getNbOutputs
[libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/aarch64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_): 
	ClipPlugin::destroy
	ClipPlugin::destroy
Traceback (most recent call last):
  File "../hands_uff_custom_plugin.py", line 225, in <module>
    main()
  File "../hands_uff_custom_plugin.py", line 211, in main
    engine = build_engine(MODEL_PATH)
  File "../hands_uff_custom_plugin.py", line 176, in build_engine
    parser.parse(uff_path, network)
RuntimeError: CHECK failed: (index) < (current_size_):

So do you have any thoughts on why I get the error "RuntimeError: CHECK failed: (index) < (current_size_): "?

have you solved this issue?

I’m running into the same issue as well while attempting to implement Fill. It seems that the error is coming from somewhere in libnvparsers but it’s unclear as to what in the library is causing the error. The number of outputs is correct according to the graph definition output by graphsurgeon.

Any help with this would be greatly appreciated!

I don’t know if this has been fixed in the newest release of TensorRT. But I could fix it a long time ago in Keras itself. Hope this helps others struggling with the same problem. Please let me know if this also works for you so I can accept this as the answer and close this issue.

=====================================

The problem with Inception is that it uses BatchNormalization with scale=False, then Keras uses tf.ones_like to replace sigma with ones. And TRT does not support this operation! Hence we need to replace this op with supported ops (i.e tf.fill).

To do this, without any need to retrain your model, you can replace the BatchNorm layers with the same weights learned in the training process but with tf.fill instead of tf.ones_like for gamma.

You can use the following code to convert your non-supported layer to a supported one:

python3 convert.py model.hdf5

And what you get is model.hdf5.supported which can be directly converted to uffs for real-time inference.

The contents of convert.py is as follows:

import re
import sys
import keras
import tensorflow as tf
from keras.layers.merge import Maximum
from keras.models import Model, load_model
import cv2
import numpy as np
from keras.applications.inception_v3 import preprocess_input, InceptionV3
from keras.layers import Dropout, BatchNormalization
import keras.backend as K

try:
    from tensorflow.compat.v1 import ConfigProto
    from tensorflow.compat.v1 import InteractiveSession
    config = ConfigProto()
    config.gpu_options.allow_growth = True
    session = InteractiveSession(config=config)
except ImportError:
    pass

# Copied from https://stackoverflow.com/questions/49492255/how-to-replace-or-insert-intermediate-layer-in-keras-model
def insert_layer_nonseq(model, layer_regex, insert_layer_factory,
                        insert_layer_name=None, position='after'):

    # Auxiliary dictionary to describe the network graph
    network_dict = {'input_layers_of': {}, 'new_output_tensor_of': {}}

    # Set the input layers of each layer
    for layer in model.layers:
        for node in layer._outbound_nodes:
            layer_name = node.outbound_layer.name
            if layer_name not in network_dict['input_layers_of']:
                network_dict['input_layers_of'].update(
                        {layer_name: [layer.name]})
            else:
                network_dict['input_layers_of'][layer_name].append(layer.name)

    # Set the output tensor of the input layer
    network_dict['new_output_tensor_of'].update(
            {model.layers[0].name: model.input})

    # Iterate over all layers after the input
    for layer in model.layers[1:]:

        # Determine input tensors
        layer_input = [network_dict['new_output_tensor_of'][layer_aux] 
                for layer_aux in network_dict['input_layers_of'][layer.name]]
        if len(layer_input) == 1:
            layer_input = layer_input[0]

        # Insert layer if name matches the regular expression
        if re.match(layer_regex, layer.name):
            if position == 'replace':
                x = layer_input
            elif position == 'after':
                x = layer(layer_input)
            elif position == 'before':
                pass
            else:
                raise ValueError('position must be: before, after or replace')

            new_layer = insert_layer_factory(layer)
            if insert_layer_name:
                new_layer.name = insert_layer_name
            else:
                new_layer.name = '{}_{}'.format(layer.name, 
                                                new_layer.name)
            x = new_layer(x)
            print('Layer {} inserted after layer {}'.format(new_layer.name,
                                                            layer.name))
            if position == 'before':
                x = layer(x)
        else:
            x = layer(layer_input)

        network_dict['new_output_tensor_of'].update({layer.name: x})

    new_model = Model(inputs=model.inputs, outputs=x)

    return new_model

# Where the supported op replacement happens
def MyConcat(input, axis=3):
    return tf.concat(values=input, axis=axis)

def replace_intermediate_layer_in_keras(model, layer_id, new_layer):
    from keras.models import Model

    layers = [l for l in model.layers]

    x = layers[0].output
    for i in range(1, len(layers)):
        if i == layer_id:
            x = new_layer(x)
        else:
            x = layers[i](x)

    new_model = Model(input=layers[0].input, output=x)
    return new_model


fname = sys.argv[1] # your input model
olist = 'dense_3/Softmax' # output name


K.set_learning_phase(0)
model = load_model(fname)
K.set_learning_phase(0)

# Model before conversion
model.summary()

def layer_factory(layer):
    # print(layer)
    weights = layer.weights
    ones = tf.fill(tf.shape(layer.weights[0]), 1.0)
    weights.insert(0, ones)
    
    arr_weights = []

    for weight in weights:
        arr = weight.eval() 
        arr_weights.append(arr)

    new_layer = keras.layers.BatchNormalization(weights=arr_weights)#weights=weights) #axis=3, 
    return new_layer

new_model = insert_layer_nonseq(model, 'batch_normalization*', layer_factory, position='replace')

# Model after conversion
new_model.summary()

# Save the new supported model
new_model.save('{}.supported'.format(fname))

# Do an inference test! 
# Optional: all the rest can be commented out
img = cv2.imread('image.jpg')
img = cv2.resize(img, (229, 229)) # you should adjust this to your networks size
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)
res = new_model.predict(img)

#print(res)
idx = np.argmax(res[0])
print(idx, res[0][idx])

print("end")

Cheers!