DepthwiseConv2D layer support in TensorRT for Mobilenet Encoder

Hi,

I am trying to port mobilenet model to tensorRT5. I am able to get to uff conversion without any warning (so the conversion looks correct to me). However when I try to load the file and parse it to build the network, I am getting errors saying difference in dimensions of two inputs.

Error :
ERROR: add_1/add: elementwise inputs must have same dimensions or follow broadcast rules (input dimensions were [13,40,63] and [13,40,62])

Has anyone managed to convert mobilenet models on tensorRT ? How can I resolve this error ?

Hello,

To help us debug, can you please share a small repro package including the mobilenet model, uff conversion and load source that demonstrate the error you are seeing?

regards,
NVIDIA Enterprise Support

I am sharing a sample model where I am getting the exact same error of dimensions mismatch for add layer operation when parsing the uff file. Following is keras model summary of my model as follows :

ERROR: add_1/add: elementwise inputs must have same dimensions or follow broadcast rules (input dimensions were [13,160,255] and [13,160,254])

Following is the entire uff conversion output

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 320, 512, 3)  0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 321, 513, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 160, 256, 32) 864         conv1_pad[0][0]                  
__________________________________________________________________________________________________
conv1_bn (BatchNormalization)   (None, 160, 256, 32) 128         conv1[0][0]                      
__________________________________________________________________________________________________
conv1_relu (ReLU)               (None, 160, 256, 32) 0           conv1_bn[0][0]                   
__________________________________________________________________________________________________
conv_dw_1 (DepthwiseConv2D)     (None, 160, 256, 32) 288         conv1_relu[0][0]                 
__________________________________________________________________________________________________
conv_dw_1_bn (BatchNormalizatio (None, 160, 256, 32) 128         conv_dw_1[0][0]                  
__________________________________________________________________________________________________
conv_dw_1_relu (ReLU)           (None, 160, 256, 32) 0           conv_dw_1_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_1 (Conv2D)              (None, 160, 256, 64) 2048        conv_dw_1_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_1_bn (BatchNormalizatio (None, 160, 256, 64) 256         conv_pw_1[0][0]                  
__________________________________________________________________________________________________
conv_pw_1_relu (ReLU)           (None, 160, 256, 64) 0           conv_pw_1_bn[0][0]               
__________________________________________________________________________________________________
conv_pad_2 (ZeroPadding2D)      (None, 161, 257, 64) 0           conv_pw_1_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_2 (DepthwiseConv2D)     (None, 80, 128, 64)  576         conv_pad_2[0][0]                 
__________________________________________________________________________________________________
conv_dw_2_bn (BatchNormalizatio (None, 80, 128, 64)  256         conv_dw_2[0][0]                  
__________________________________________________________________________________________________
conv_dw_2_relu (ReLU)           (None, 80, 128, 64)  0           conv_dw_2_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_2 (Conv2D)              (None, 80, 128, 128) 8192        conv_dw_2_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_2_bn (BatchNormalizatio (None, 80, 128, 128) 512         conv_pw_2[0][0]                  
__________________________________________________________________________________________________
conv_pw_2_relu (ReLU)           (None, 80, 128, 128) 0           conv_pw_2_bn[0][0]               
__________________________________________________________________________________________________
score_1x1 (Conv2D)              (None, 80, 128, 13)  1677        conv_pw_2_relu[0][0]             
__________________________________________________________________________________________________
score_feed1 (Conv2D)            (None, 160, 256, 13) 845         conv_pw_1_relu[0][0]             
__________________________________________________________________________________________________
upscore2 (Conv2DTranspose)      (None, 160, 256, 13) 2717        score_1x1[0][0]                  
__________________________________________________________________________________________________
score_feed1_bn (BatchNormalizat (None, 160, 256, 13) 52          score_feed1[0][0]                
__________________________________________________________________________________________________
upscore2_bn (BatchNormalization (None, 160, 256, 13) 52          upscore2[0][0]                   
__________________________________________________________________________________________________
add_1 (Add)                     (None, 160, 256, 13) 0           score_feed1_bn[0][0]             
                                                                 upscore2_bn[0][0]                
==================================================================================================
Total params: 18,591
Trainable params: 17,899
Non-trainable params: 692
__________________________________________________________________________________________________
(env) :~$ convert-to-uff -o sample_problem_model.uff --input-file sample_problem_model.pb -O add_1/add

env/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprd. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Loading mobilenet_model.pb
WARNING:tensorflow:From env/lib/python3.5/site-packages/uff/converters/tensorflow/conversion_helpers.py:185: FastGFile.__init__ (from tensorflow.python.prm.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
UFF Version 0.5.5
=== Automatically deduced input nodes ===
[name: "input_1"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: -1
      }
      dim {
        size: 320
      }
      dim {
        size: 512
      }
      dim {
        size: 3
      }
    }
  }
}
]
=========================================

Using output node add_1/add
Converting to UFF graph
No. nodes: 88

UFF Output written to sample_problem_model.uff

UFF Version 0.5.5 (on Ubuntu 16.04)
TensorRT-5.0.4.3 (C++ API on Windows)

Parser code snippet

int maxBatchSize = 1;
    auto parser = createUffParser();

    /* Register tensorflow input */
    parser->registerInput("input_1", Dims3(3, 320, 512), UffInputOrder::kNHWC);
    parser->registerOutput("add_1/add");

 IBuilder* builder = createInferBuilder(gLogger);
    INetworkDefinition* network = builder->createNetwork();

	std::cout << " starting parsing" << std::endl;

    if (!parser->parse(uffFile, *network, nvinfer1::DataType::kFLOAT))
        RETURN_AND_LOG(nullptr, ERROR, "Fail to parse");

Please let me know if more information is required.
sample_problem_model.zip (130 KB)

Hi all,

I managed to solve this issue and the root cause seems to be that Conv2D layer implementation in tensorRT supports only padding=same. My implementation had padding=valid in Conv2D layer which worked well in keras with tensorflow backend, however when I converted that model pb file to uff, it changed the dimensions of the input incorrectly. Because of mismatch in dimensions of inputs for add layer, when I try to build the network by loading uff file it gives parsing error.

I would like to ask Nvidia, is there any plan for adding support for padding=valid in tensorRT ?