Fine tune ssd_mobilenet_v1 model can't convert

I success convert the origin ssd_mobilenet_v1_coco_2018_01_28 model to tensorrt by following this link https://github.com/AastaNV/TRT_object_detection, but when I fine tune the origin model with my own dataset, export it to pb file and
try to convert it to tensorrt, it failed.

Here are the error message:

-------
Traceback (most recent call last):
  File "main.py", line 31, in <module>
    dynamic_graph = model.add_plugin(gs.DynamicGraph(model.path))
  File "/home/blackwalnut/TensorRT/JestonNano/TRT_object_detection/config/model_ssd_mobilenet_v1_aicar.py", line 87, in add_plugin
    graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/internal/containers.py", line 295, in remove
    self._values.remove(elem)
ValueError: list.remove(x): x not in list

Could you tell me what’s wrong with that?

Hi,

Do you update the layer name of ssd_mobilenet_v1_coco_2018_01_28?

Based on the error message, it looks like the placeholder layer name changed.
If yes, please help to update the new name here:
https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v1_coco_2018_01_28.py#L79

Thanks.

Hi NVIDIA,
I check the node’s name and found that google’s model export tool changed the placeholder layer name to ‘Cast’???

Then I change the code

graph.find_nodes_by_name("Input")[0].input.remove("Cast")

I run the converter again, the ‘tmp.uff’ file generaged successfully, but there are another error message:

[TensorRT] ERROR: UFFParser: Parser error: BoxPredictor_0/Reshape: Reshape: -1 dimension specified more than 1 time
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
  File "main.py", line 44, in <module>
    buf = engine.serialize()
AttributeError: 'NoneType' object has no attribute 'serialize'

Could you tell me what’s wrong with that? Thank you.

Hi,

Would you mind to share your .pb file with us?
Thanks.

Sure, there are the model files.

https://drive.google.com/file/d/18Oz53f8CfE6f6kko8DFW37ZmwSZdIRkT/view?usp=sharing

Thank you.

Hi,

Thanks for your model.
We will check this and update information with you later.

Thanks

Hi,

Sorry for the late reply.

After checking, there is an extra layer called “cast” which is not supported by TensorRT yet.
Is it possible to retrain your network without cast layer?
Or you will need to implement it as plugin layer.

Thanks.

Hi,

Thank you. And if I want to implement is as plugin layer, what should I do? I noticed that the “namespace_pulgin_map” maybe include the plugin’s map, but I don’t know how to implement another plug, could do please tell me how to do it? Thank you again.

Hi,

I try to add “Cast:Input” into variable “namespace_plugin_map” in model_ssd_mobilenet_v1.py and change register_input from “Input” to “Cast” in main.py, but another error displayed.

[TensorRT] ERROR: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise: at least three non-batch dimensions are required for input
[TensorRT] ERROR: UFFParser: Parser error: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/BatchNorm/FusedBatchNorm: The input to the Scale Layer is required to have a minimum of 3 dimensions.
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
  File "main.py", line 44, in <module>
    buf = engine.serialize()
AttributeError: 'NoneType' object has no attribute 'serialize'

Hi,

The ssd_mobilenet model was trained by google’s model called object detection api, I try to find the code how google change the model layer but failed. Could you please help me find the way to retrain the object detection model?

Hi,

First, Ignore the changes I made earlier.

I reduced the Google’s Object Detection API version by follow this link https://devtalk.nvidia.com/default/topic/1043557/tensorrt/error-uffparser-parser-error-boxpredictor_0-reshape-reshape-1-dimension-specified-more-than-1-/post/5317840/#5317840 and reexport the model file.

Then I follow this link to change the “Postprocessor” https://devtalk.nvidia.com/default/topic/1044680/tensorrt/run-ssd_mobilenetv2-tensorflow-object-detection-api-on-tensorrt/post/5300899/#5300899

Thanks god, It works!!!

Bellow are the config.py file:

import graphsurgeon as gs

path = 'model/ssd_mobilenet_v1_aicar/frozen_inference_graph.pb'
TRTbin = 'ssd_mobilenet_v1_aicar.bin'
output_name = ['Postprocessor']
dims = [3,300,300]
layout = 7

def add_plugin(graph):
    all_assert_nodes = graph.find_nodes_by_op("Assert")
    graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)

    all_identity_nodes = graph.find_nodes_by_op("Identity")
    graph.forward_inputs(all_identity_nodes)

    Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        shape=[1, 3, 300, 300]
    )

    PriorBox = gs.create_plugin_node(
        name="MultipleGridAnchorGenerator",
        op="GridAnchor_TRT",
        minSize=0.2,
        maxSize=0.95,
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        variance=[0.1,0.1,0.2,0.2],
        featureMapShapes=[19, 10, 5, 3, 2, 1],
        numLayers=6
    )

    Postprocessor = gs.create_plugin_node(
        name="Postprocessor",
        op="NMS_TRT",
        shareLocation=1,
        varianceEncodedInTarget=0,
        backgroundLabelId=0,
        confidenceThreshold=1e-8,
        nmsThreshold=0.6,
        topK=100,
        keepTopK=100,
        numClasses=7, # I have 6 labels + 1 background label
        inputOrder=[0, 2, 1],
        confSigmoid=1,
        isNormalized=1
    )

    concat_priorbox = gs.create_node(
        "concat_priorbox",
        op="ConcatV2",
        axis=2
    )

    concat_box_loc = gs.create_plugin_node(
        "concat_box_loc",
        op="FlattenConcat_TRT",
    )

    concat_box_conf = gs.create_plugin_node(
        "concat_box_conf",
        op="FlattenConcat_TRT",
    )

    namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": Postprocessor,
        "Preprocessor": Input,
        "ToFloat": Input,
        "image_tensor": Input,
        "MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }
    
    graph.collapse_namespaces(namespace_plugin_map)
    graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)

    return graph

Hi AngelZheng,

I am facing same issue in Mobilenet V1 and V2.

I read this whole thread but could not understand your solution to this issue.

Can you please elaborate more?

Thanks

Hi miteshp,

Google’s new object detection api has changed it’s model structure, so we need back to the preview version and export the model. Next follow my last comment.Notice that you should change the model path and Postprocessor’s numClasses, it depends your own model.

Hope it will help you.

Thank you.

Hi @AngelZheng

I have tried with both links. I am getting error of unsupported _Cast operation.

I think problem is with version of Tensorflow.

  1. On which Tensorflow version you have trained SSD Mobilenet V2 on custom data-sets?

  2. What is the hash value of your Tensorflow models repo on which you have generated inference graph?

Can you please check ?

I am getting error of unsupported _Cast operation.

Hi @miteshp.patel

  1. I Trained SSD-MobileNet V1 by using TensorFlow 1.13, notice that the model is SSD-MobileNet V1

  2. ae0a9409212d0072938fa60c9f85740bb89ced7e

Maybe you can change the model to SSD-MobileNet V1, hopes you will success!

Hi @AngelZheng

I have tried with version Tensorflow version 1.13 and ae0a9409212d0072938fa60c9f85740bb89ced7e this hash of models research repo.

Even then I am getting same error of _Cast operation in SSD MObilenet V1 and V2.

Can you please check versions of protoc by command:

  1. protoc --version -> libprotoc 3.0.0

  2. pip show protbuf -> Version: 3.8.0

Hi,

I think maybe the problem is in the config.py, can u share it?

And my protoc’s version is libprotoc 3.7.0, I don’t have protbuf.

Hi @AngelZheng ,

Thanks for reply.

Here is my training config

model {
  ssd {
    num_classes: 6
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    feature_extractor {
      type: "ssd_mobilenet_v1"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 3.99999989895e-05
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.0299999993294
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.999700009823
          center: true
          scale: true
          epsilon: 0.0010000000475
          train: true
        }
      }
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 3.99999989895e-05
            }
          }
          initializer {
            truncated_normal_initializer {
              mean: 0.0
              stddev: 0.0299999993294
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.999700009823
            center: true
            scale: true
            epsilon: 0.0010000000475
            train: true
          }
        }
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.800000011921
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.20000000298
        max_scale: 0.949999988079
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.333299994469
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 0.300000011921
        iou_threshold: 0.600000023842
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.990000009537
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
  }
}
train_config {
  batch_size: 24
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
  optimizer {
    rms_prop_optimizer {
      learning_rate {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.00400000018999
          decay_steps: 800720
          decay_factor: 0.949999988079
        }
      }
      momentum_optimizer_value: 0.899999976158
      decay: 0.899999976158
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/content/car_data/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  num_steps: 200000
}
train_input_reader {
  label_map_path: "/content/car_data/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/content/car_data/train.record"
  }
}
eval_config {
  num_examples: 8000
  max_evals: 10
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "/content/car_data/label_map.pbtxt"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "/content/car_data/test.record"
  }
}

Here is my config :

import graphsurgeon as gs

path = 'model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb'
TRTbin = 'TRT_ssd_mobilenet_v1_coco_2018_01_28.bin'
output_name = ['Postprocessor']
dims = [3,300,300]
layout = 7

def add_plugin(graph):
    all_assert_nodes = graph.find_nodes_by_op("Assert")
    graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)

    all_identity_nodes = graph.find_nodes_by_op("Identity")
    graph.forward_inputs(all_identity_nodes)

    Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        shape=[1, 3, 300, 300]
    )

    PriorBox = gs.create_plugin_node(
        name="MultipleGridAnchorGenerator",
        op="GridAnchor_TRT",
        minSize=0.2,
        maxSize=0.95,
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        variance=[0.1,0.1,0.2,0.2],
        featureMapShapes=[19, 10, 5, 3, 2, 1],
        numLayers=6
    )

    Postprocessor = gs.create_plugin_node(
        name="Postprocessor",
        op="NMS_TRT",
        shareLocation=1,
        varianceEncodedInTarget=0,
        backgroundLabelId=0,
        confidenceThreshold=1e-8,
        nmsThreshold=0.6,
        topK=100,
        keepTopK=100,
        numClasses=7,
        inputOrder=[0, 2, 1],
        confSigmoid=1,
        isNormalized=1
    )

    concat_priorbox = gs.create_node(
        "concat_priorbox",
        op="ConcatV2",
        axis=2
    )

    concat_box_loc = gs.create_plugin_node(
        "concat_box_loc",
        op="FlattenConcat_TRT",
    )

    concat_box_conf = gs.create_plugin_node(
        "concat_box_conf",
        op="FlattenConcat_TRT",
    )

    namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": Postprocessor,
        "Preprocessor": Input,
        "ToFloat": Input,
        "image_tensor": Input,
        "MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }

    graph.collapse_namespaces(namespace_plugin_map)
    graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)
    graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input")
    graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")

    return graph

I am not changing any parameters except numClasses.

Hi miteshp.patel,

Not the same, you can try my config.py.

import graphsurgeon as gs
import tensorflow as tf

path = 'model/ssd_mobilenet_v1_aicar/frozen_inference_graph.pb'
TRTbin = 'ssd_mobilenet_v1_aicar.bin'
output_name = ['Postprocessor']
dims = [3,300,300]
layout = 7

def add_plugin(graph):
    all_assert_nodes = graph.find_nodes_by_op("Assert")
    graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)

    all_identity_nodes = graph.find_nodes_by_op("Identity")
    graph.forward_inputs(all_identity_nodes)

    Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        shape=[1, 3, 300, 300]
    )

    PriorBox = gs.create_plugin_node(
        name="MultipleGridAnchorGenerator",
        op="GridAnchor_TRT",
        minSize=0.2,
        maxSize=0.95,
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        variance=[0.1,0.1,0.2,0.2],
        featureMapShapes=[19, 10, 5, 3, 2, 1],
        numLayers=6
    )

    Postprocessor = gs.create_plugin_node(
        name="Postprocessor",
        op="NMS_TRT",
        shareLocation=1,
        varianceEncodedInTarget=0,
        backgroundLabelId=0,
        confidenceThreshold=1e-8,
        nmsThreshold=0.6,
        topK=100,
        keepTopK=100,
        numClasses=7,
        inputOrder=[0, 2, 1],
        confSigmoid=1,
        isNormalized=1
    )

    concat_priorbox = gs.create_node(
        "concat_priorbox",
        op="ConcatV2",
        dtype=tf.float32,
        axis=2
    )

    concat_box_loc = gs.create_plugin_node(
        "concat_box_loc",
        op="FlattenConcat_TRT",
        dtype=tf.float32,
        axis=1,
        ignoreBatch=0
    )

    concat_box_conf = gs.create_plugin_node(
        "concat_box_conf",
        op="FlattenConcat_TRT",
        dtype=tf.float32,
        axis=1,
        ignoreBatch=0
    )

    namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": Postprocessor,
        "Preprocessor": Input,
        "ToFloat": Input,
        "image_tensor": Input,
        "MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }

    #print('!!!===!!!')
    #print(graph.graph_inputs)
    #print('!!!===!!!')
    
    graph.collapse_namespaces(namespace_plugin_map)
    graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)
    
    print(graph.find_nodes_by_op("NMS_TRT"))

    #graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input")
    #graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")

    return graph

Don’t forget change the path.

Hi,

_Cast layer is generated by the newer Object Detection API, so maybe you should return to path ae0a9409212d0072938fa60c9f85740bb89ced7e.