Fine tune ssd_mobilenet_v1 model can't convert

AngelZheng · June 10, 2019, 4:47am

I success convert the origin ssd_mobilenet_v1_coco_2018_01_28 model to tensorrt by following this link https://github.com/AastaNV/TRT_object_detection, but when I fine tune the origin model with my own dataset, export it to pb file and
try to convert it to tensorrt, it failed.

Here are the error message:

-------
Traceback (most recent call last):
  File "main.py", line 31, in <module>
    dynamic_graph = model.add_plugin(gs.DynamicGraph(model.path))
  File "/home/blackwalnut/TensorRT/JestonNano/TRT_object_detection/config/model_ssd_mobilenet_v1_aicar.py", line 87, in add_plugin
    graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/internal/containers.py", line 295, in remove
    self._values.remove(elem)
ValueError: list.remove(x): x not in list

Could you tell me what’s wrong with that?

AastaLLL · June 11, 2019, 3:04am

Hi,

Do you update the layer name of ssd_mobilenet_v1_coco_2018_01_28?

Based on the error message, it looks like the placeholder layer name changed.
If yes, please help to update the new name here:
[url]TRT_object_detection/model_ssd_mobilenet_v1_coco_2018_01_28.py at master · AastaNV/TRT_object_detection · GitHub

Thanks.

AngelZheng · June 11, 2019, 3:28am

Hi NVIDIA,
I check the node’s name and found that google’s model export tool changed the placeholder layer name to ‘Cast’???

Then I change the code

graph.find_nodes_by_name("Input")[0].input.remove("Cast")

I run the converter again, the ‘tmp.uff’ file generaged successfully, but there are another error message:

[TensorRT] ERROR: UFFParser: Parser error: BoxPredictor_0/Reshape: Reshape: -1 dimension specified more than 1 time
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
  File "main.py", line 44, in <module>
    buf = engine.serialize()
AttributeError: 'NoneType' object has no attribute 'serialize'

Could you tell me what’s wrong with that? Thank you.

AastaLLL · June 12, 2019, 1:38am

Hi,

Would you mind to share your .pb file with us?
Thanks.

AngelZheng · June 12, 2019, 8:49am

Sure, there are the model files.

https://drive.google.com/file/d/18Oz53f8CfE6f6kko8DFW37ZmwSZdIRkT/view?usp=sharing

Thank you.

AastaLLL · June 18, 2019, 7:26am

Hi,

Thanks for your model.
We will check this and update information with you later.

Thanks

AastaLLL · June 24, 2019, 8:20am

Hi,

Sorry for the late reply.

After checking, there is an extra layer called “cast” which is not supported by TensorRT yet.
Is it possible to retrain your network without cast layer?
Or you will need to implement it as plugin layer.

Thanks.

AngelZheng · June 25, 2019, 5:57am

Hi,

Thank you. And if I want to implement is as plugin layer, what should I do? I noticed that the “namespace_pulgin_map” maybe include the plugin’s map, but I don’t know how to implement another plug, could do please tell me how to do it? Thank you again.

AngelZheng · June 25, 2019, 7:33am

Hi,

I try to add “Cast:Input” into variable “namespace_plugin_map” in model_ssd_mobilenet_v1.py and change register_input from “Input” to “Cast” in main.py, but another error displayed.

[TensorRT] ERROR: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise: at least three non-batch dimensions are required for input
[TensorRT] ERROR: UFFParser: Parser error: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/BatchNorm/FusedBatchNorm: The input to the Scale Layer is required to have a minimum of 3 dimensions.
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
  File "main.py", line 44, in <module>
    buf = engine.serialize()
AttributeError: 'NoneType' object has no attribute 'serialize'

AngelZheng · June 25, 2019, 11:55am

Hi,

The ssd_mobilenet model was trained by google’s model called object detection api, I try to find the code how google change the model layer but failed. Could you please help me find the way to retrain the object detection model?

AngelZheng · June 27, 2019, 5:59am

Hi,

First, Ignore the changes I made earlier.

I reduced the Google’s Object Detection API version by follow this link https://devtalk.nvidia.com/default/topic/1043557/tensorrt/error-uffparser-parser-error-boxpredictor_0-reshape-reshape-1-dimension-specified-more-than-1-/post/5317840/#5317840 and reexport the model file.

Then I follow this link to change the “Postprocessor” https://devtalk.nvidia.com/default/topic/1044680/tensorrt/run-ssd_mobilenetv2-tensorflow-object-detection-api-on-tensorrt/post/5300899/#5300899

Thanks god, It works!!!

Bellow are the config.py file:

import graphsurgeon as gs

path = 'model/ssd_mobilenet_v1_aicar/frozen_inference_graph.pb'
TRTbin = 'ssd_mobilenet_v1_aicar.bin'
output_name = ['Postprocessor']
dims = [3,300,300]
layout = 7

def add_plugin(graph):
    all_assert_nodes = graph.find_nodes_by_op("Assert")
    graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)

    all_identity_nodes = graph.find_nodes_by_op("Identity")
    graph.forward_inputs(all_identity_nodes)

    Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        shape=[1, 3, 300, 300]
    )

    PriorBox = gs.create_plugin_node(
        name="MultipleGridAnchorGenerator",
        op="GridAnchor_TRT",
        minSize=0.2,
        maxSize=0.95,
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        variance=[0.1,0.1,0.2,0.2],
        featureMapShapes=[19, 10, 5, 3, 2, 1],
        numLayers=6
    )

    Postprocessor = gs.create_plugin_node(
        name="Postprocessor",
        op="NMS_TRT",
        shareLocation=1,
        varianceEncodedInTarget=0,
        backgroundLabelId=0,
        confidenceThreshold=1e-8,
        nmsThreshold=0.6,
        topK=100,
        keepTopK=100,
        numClasses=7, # I have 6 labels + 1 background label
        inputOrder=[0, 2, 1],
        confSigmoid=1,
        isNormalized=1
    )

    concat_priorbox = gs.create_node(
        "concat_priorbox",
        op="ConcatV2",
        axis=2
    )

    concat_box_loc = gs.create_plugin_node(
        "concat_box_loc",
        op="FlattenConcat_TRT",
    )

    concat_box_conf = gs.create_plugin_node(
        "concat_box_conf",
        op="FlattenConcat_TRT",
    )

    namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": Postprocessor,
        "Preprocessor": Input,
        "ToFloat": Input,
        "image_tensor": Input,
        "MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }
    
    graph.collapse_namespaces(namespace_plugin_map)
    graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)

    return graph

miteshp.patel · July 12, 2019, 11:04am

Hi AngelZheng,

I am facing same issue in Mobilenet V1 and V2.

I read this whole thread but could not understand your solution to this issue.

Can you please elaborate more?

Thanks

AngelZheng · July 19, 2019, 5:53pm

Hi miteshp,

Google’s new object detection api has changed it’s model structure, so we need back to the preview version and export the model. Next follow my last comment.Notice that you should change the model path and Postprocessor’s numClasses, it depends your own model.

Hope it will help you.

Thank you.

miteshp.patel · July 25, 2019, 11:09am

Hi @AngelZheng

I have tried with both links. I am getting error of unsupported _Cast operation.

I think problem is with version of Tensorflow.

On which Tensorflow version you have trained SSD Mobilenet V2 on custom data-sets?
What is the hash value of your Tensorflow models repo on which you have generated inference graph?

Can you please check ?

I am getting error of unsupported _Cast operation.

AngelZheng · July 26, 2019, 6:04am

Hi @miteshp.patel

I Trained SSD-MobileNet V1 by using TensorFlow 1.13, notice that the model is SSD-MobileNet V1
ae0a9409212d0072938fa60c9f85740bb89ced7e

Maybe you can change the model to SSD-MobileNet V1, hopes you will success!

miteshp.patel · August 1, 2019, 8:21am

Hi @AngelZheng

I have tried with version Tensorflow version 1.13 and ae0a9409212d0072938fa60c9f85740bb89ced7e this hash of models research repo.

Even then I am getting same error of _Cast operation in SSD MObilenet V1 and V2.

Can you please check versions of protoc by command:

protoc --version → libprotoc 3.0.0
pip show protbuf → Version: 3.8.0

AngelZheng · August 2, 2019, 3:46am

Hi,

I think maybe the problem is in the config.py, can u share it?

And my protoc’s version is libprotoc 3.7.0, I don’t have protbuf.

miteshp.patel · August 2, 2019, 5:00am

Hi @AngelZheng ,

Thanks for reply.

Here is my training config

model {
  ssd {
    num_classes: 6
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    feature_extractor {
      type: "ssd_mobilenet_v1"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 3.99999989895e-05
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.0299999993294
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.999700009823
          center: true
          scale: true
          epsilon: 0.0010000000475
          train: true
        }
      }
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 3.99999989895e-05
            }
          }
          initializer {
            truncated_normal_initializer {
              mean: 0.0
              stddev: 0.0299999993294
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.999700009823
            center: true
            scale: true
            epsilon: 0.0010000000475
            train: true
          }
        }
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.800000011921
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.20000000298
        max_scale: 0.949999988079
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.333299994469
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 0.300000011921
        iou_threshold: 0.600000023842
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.990000009537
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
  }
}
train_config {
  batch_size: 24
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
  optimizer {
    rms_prop_optimizer {
      learning_rate {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.00400000018999
          decay_steps: 800720
          decay_factor: 0.949999988079
        }
      }
      momentum_optimizer_value: 0.899999976158
      decay: 0.899999976158
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/content/car_data/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  num_steps: 200000
}
train_input_reader {
  label_map_path: "/content/car_data/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/content/car_data/train.record"
  }
}
eval_config {
  num_examples: 8000
  max_evals: 10
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "/content/car_data/label_map.pbtxt"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "/content/car_data/test.record"
  }
}

Here is my config :

import graphsurgeon as gs

path = 'model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb'
TRTbin = 'TRT_ssd_mobilenet_v1_coco_2018_01_28.bin'
output_name = ['Postprocessor']
dims = [3,300,300]
layout = 7

def add_plugin(graph):
    all_assert_nodes = graph.find_nodes_by_op("Assert")
    graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)

    all_identity_nodes = graph.find_nodes_by_op("Identity")
    graph.forward_inputs(all_identity_nodes)

    Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        shape=[1, 3, 300, 300]
    )

    PriorBox = gs.create_plugin_node(
        name="MultipleGridAnchorGenerator",
        op="GridAnchor_TRT",
        minSize=0.2,
        maxSize=0.95,
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        variance=[0.1,0.1,0.2,0.2],
        featureMapShapes=[19, 10, 5, 3, 2, 1],
        numLayers=6
    )

    Postprocessor = gs.create_plugin_node(
        name="Postprocessor",
        op="NMS_TRT",
        shareLocation=1,
        varianceEncodedInTarget=0,
        backgroundLabelId=0,
        confidenceThreshold=1e-8,
        nmsThreshold=0.6,
        topK=100,
        keepTopK=100,
        numClasses=7,
        inputOrder=[0, 2, 1],
        confSigmoid=1,
        isNormalized=1
    )

    concat_priorbox = gs.create_node(
        "concat_priorbox",
        op="ConcatV2",
        axis=2
    )

    concat_box_loc = gs.create_plugin_node(
        "concat_box_loc",
        op="FlattenConcat_TRT",
    )

    concat_box_conf = gs.create_plugin_node(
        "concat_box_conf",
        op="FlattenConcat_TRT",
    )

    namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": Postprocessor,
        "Preprocessor": Input,
        "ToFloat": Input,
        "image_tensor": Input,
        "MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }

    graph.collapse_namespaces(namespace_plugin_map)
    graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)
    graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input")
    graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")

    return graph

I am not changing any parameters except numClasses.

AngelZheng · August 5, 2019, 6:24am

Hi miteshp.patel,

Not the same, you can try my config.py.

import graphsurgeon as gs
import tensorflow as tf

path = 'model/ssd_mobilenet_v1_aicar/frozen_inference_graph.pb'
TRTbin = 'ssd_mobilenet_v1_aicar.bin'
output_name = ['Postprocessor']
dims = [3,300,300]
layout = 7

def add_plugin(graph):
    all_assert_nodes = graph.find_nodes_by_op("Assert")
    graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)

    all_identity_nodes = graph.find_nodes_by_op("Identity")
    graph.forward_inputs(all_identity_nodes)

    Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        shape=[1, 3, 300, 300]
    )

    PriorBox = gs.create_plugin_node(
        name="MultipleGridAnchorGenerator",
        op="GridAnchor_TRT",
        minSize=0.2,
        maxSize=0.95,
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        variance=[0.1,0.1,0.2,0.2],
        featureMapShapes=[19, 10, 5, 3, 2, 1],
        numLayers=6
    )

    Postprocessor = gs.create_plugin_node(
        name="Postprocessor",
        op="NMS_TRT",
        shareLocation=1,
        varianceEncodedInTarget=0,
        backgroundLabelId=0,
        confidenceThreshold=1e-8,
        nmsThreshold=0.6,
        topK=100,
        keepTopK=100,
        numClasses=7,
        inputOrder=[0, 2, 1],
        confSigmoid=1,
        isNormalized=1
    )

    concat_priorbox = gs.create_node(
        "concat_priorbox",
        op="ConcatV2",
        dtype=tf.float32,
        axis=2
    )

    concat_box_loc = gs.create_plugin_node(
        "concat_box_loc",
        op="FlattenConcat_TRT",
        dtype=tf.float32,
        axis=1,
        ignoreBatch=0
    )

    concat_box_conf = gs.create_plugin_node(
        "concat_box_conf",
        op="FlattenConcat_TRT",
        dtype=tf.float32,
        axis=1,
        ignoreBatch=0
    )

    namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": Postprocessor,
        "Preprocessor": Input,
        "ToFloat": Input,
        "image_tensor": Input,
        "MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }

    #print('!!!===!!!')
    #print(graph.graph_inputs)
    #print('!!!===!!!')
    
    graph.collapse_namespaces(namespace_plugin_map)
    graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)
    
    print(graph.find_nodes_by_op("NMS_TRT"))

    #graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input")
    #graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")

    return graph

Don’t forget change the path.

AngelZheng · August 20, 2019, 3:55am

Hi,

_Cast layer is generated by the newer Object Detection API, so maybe you should return to path ae0a9409212d0072938fa60c9f85740bb89ced7e.

Topic		Replies	Views
Mobilenet_V2 sampleUffSSD not Working -- Help Please! TensorRT	12	2127	February 28, 2020
Error while converting ssd mobilenet v2 to tensorrt engine in nano Jetson Nano	7	1050	October 15, 2021
How to retrain ssd_inception_v2_coco_2017_11_17 from the tensorrt - samples Jetson TX2	33	7123	October 18, 2021
Convert the pb files into UFF, some issues to solve Jetson Nano tensorrt	2	390	October 18, 2021
Tensorrt support for SSD_inception trained on custom dataset TensorRT	15	2728	October 12, 2021
python3: TensorRT nmsPlugin.cpp:54: error Jetson Nano	5	1011	October 14, 2021
jetson-inference with custom model Jetson Nano	10	3514	October 15, 2021
Problem converting ONNX model to TensorRT Engine for SSD Mobilenet V2 Jetson Nano tensorrt , nvbugs , ssd , onnx	38	9031	October 18, 2021
Exporting Tensorflow models to Jetson Nano Jetson Nano tensorflow	25	6884	October 15, 2021
TensorRT and Tensorflow: convert to uff failed Jetson TX2	43	14773	October 18, 2021

Fine tune ssd_mobilenet_v1 model can't convert

Related topics