Mobilenet_V2 sampleUffSSD not Working -- Help Please!

I am relatively new to tensorrt.
I have been trying to apply my tesnorflow mobilenet_V2 trained model to tensorRT by using the example from tensorrt ‘sampleUffSSD’.

setup versions:
tensorflow-gpu 1.12.0
Cuda compilation tools, release 10.0, V10.0.326
tensorrt 5.1.2.2

Results

  1. Testing with inceptions worked fine as described in the nvidia guide

  2. Testing with Mobilenet_V2 without training outputs these errors

    sampleUffSSD$ …/…/bin/sample_uff_ssd
    &&&& RUNNING TensorRT.sample_uff_ssd # …/…/bin/sample_uff_ssd
    [I] …/…/data/ssd/sample_ssd_relu6.uff
    [I] Begin parsing model…
    [E] [TRT] UffParser: Unsupported number of graph 0
    [E] Failure while parsing UFF file
    sample_uff_ssd: sampleUffSSD.cpp:542: int main(int, char**): Assertion `tmpEngine != nullptr’
    failed.
    Aborted (core dumped)

  3. Testing with custom trained Mobilenet_V2 output these errors

    tensorrt/samples/sampleUffSSD$ …/…/bin/sample_uff_ssd
    &&&& RUNNING TensorRT.sample_uff_ssd # …/…/bin/sample_uff_ssd
    [I] …/…/data/ssd/sample_ssd_relu6.uff
    [I] Begin parsing model…
    [E] [TRT] UffParser: Validator error:
    FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128_depthwise/BatchNorm/FusedBatchNormV3:
    Unsupported operation _FusedBatchNormV3
    [E] Failure while parsing UFF file
    sample_uff_ssd: sampleUffSSD.cpp:542: int main(int, char**): Assertion `tmpEngine != nullptr’
    failed.
    Aborted (core dumped)

I noticed that fusedbatchNormV3 was introduced to my custom trained (tranfser learning) Mobilenet_v2 frozen model because it was not present in the original mobilenet_v2 frozen model that I downloaded from the internet. fusedbatchNormV3 Seems to be partly a reason for the issues.

I also think the config.py file setting may also help, I knew what to use it.

Please need help with this as it has held us from progressing for over a week now!

Would greatly appreciate a straightforward way of converting tensorflow mobilenet_V2 model to a working tensorrt model.

The SampleUFF_SSD is for ssd_inception_v2_coco_2017_11_17 model, not MobileNetV2.

You do have the converter for MobileNetv2 here https://github.com/AastaNV/TRT_object_detection but beware of the “version” of the model.

I’ve been struggling all week long as well just to try to use my own trained MobileNetv2 model, but still did not managed to…
Between the custom plugin required, the different model versions, output names, I got hard time following all the hints.

Thanks for the reply and the link. The link appears to use python. I am using C++ and most of my application written so far is in C++. Hence why I was interested in having a solution to the sampleUffSDD example.

Somone shared a way to do this but was without the bounding box to indicate what was detected.
https://devtalk.nvidia.com/default/topic/1049802/jetson-nano/object-detection-with-mobilenet-ssd-slower-than-mentioned-speed/post/5327974/#5327974

When I uncomment the bounding box part of the code, I got a segmentation fault.

Its quite frustating to have to scavenge for information on an application (tensorrt) that it’s main purpose is to optimise models.
Would have expected code for the optimasation of the major models like mobilenet_v2 and on trained (using transfer learning) tensorflow models.

I’m discovering / learning as I’m reading all the attempts of everybody to have a working MobileNetV2_SDD working with TensorRT…

I’m now stuck (and not the only one) at the point where TensorRT complains about unsupported Cast operation.
Looks like it’s everywhere in the graph so I’m wondering how anybody managed to convert a MobileNetV2_SDD to TensorRT without a custom implementation…

Or is it an operation added by newer version of TensorFlow ?

Cheers

True, likely to do with newer tensorflow training or graph creation processes that adds the ‘FusedBatchNorm’ to the frozen graph file. Because it is absent from the untrained Mobilenet_V2.

Also got past the segmentation fault. Had to do with label list.
Now I am getting '&&&& FAILED TensorRT.sample_uff_ssd # ./sample_uff_ssd_rect

This has plagued me for almost 2 weeks now.

Need help.

I’ve exported my trained network with Tensorflow 1.12, and indeed no more “cast” issue.

though I still can’t execute :

&&&& RUNNING TensorRT.sample_uff_ssd # ./sample_uff_ssd_debug
[I] ../data/ssd/sample_ssd_relu6.uff
[I] Begin parsing model...
[libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/aarch64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_):
terminate called after throwing an instance of 'google_private::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_):
Aborted (core dumped)

Gotta dive into that now…what a journey !

that error looks familiar. Seems to show up for trained models.

Did you modify the config.py file before converting to uff file?

Yes I did, to update the number of classes (2 as I’ve got only one class)

import graphsurgeon as gs
import tensorflow as tf

Input = gs.create_node("Input",
    op="Placeholder",
    dtype=tf.float32,
    shape=[1, 3, 300, 300])
PriorBox = gs.create_plugin_node(name="GridAnchor", op="GridAnchor_TRT",
    numLayers=6,
    minSize=0.2,
    maxSize=0.95,
    aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
    variance=[0.1,0.1,0.2,0.2],
    featureMapShapes=[19, 10, 5, 3, 2, 1])
NMS = gs.create_plugin_node(name="NMS", op="NMS_TRT",
    shareLocation=1,
    varianceEncodedInTarget=0,
    backgroundLabelId=0,
    confidenceThreshold=1e-8,
    nmsThreshold=0.6,
    topK=100,
    keepTopK=100,
    numClasses=2,
    inputOrder=[0, 2, 1],
    confSigmoid=1,
    isNormalized=1)
concat_priorbox = gs.create_node(name="concat_priorbox", op="ConcatV2", dtype=tf.float32, axis=2)
concat_box_loc = gs.create_plugin_node("concat_box_loc", op="FlattenConcat_TRT", dtype=tf.float32, axis=1, ignoreBatch=0)
concat_box_conf = gs.create_plugin_node("concat_box_conf", op="FlattenConcat_TRT", dtype=tf.float32, axis=1, ignoreBatch=0)

namespace_plugin_map = {
    "MultipleGridAnchorGenerator": PriorBox,
    "Postprocessor": NMS,
    "Preprocessor": Input,
    "ToFloat": Input,
    "image_tensor": Input,
    "MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
    "MultipleGridAnchorGenerator/Identity": concat_priorbox,
    "concat": concat_box_loc,
    "concat_1": concat_box_conf
}

def preprocess(dynamic_graph):
    all_assert_nodes = dynamic_graph.find_nodes_by_op("Assert")
    print("there are the assert nodes..")
    print(all_assert_nodes)

    all_identity_nodes = dynamic_graph.find_nodes_by_op("Identity")
    dynamic_graph.forward_inputs(all_identity_nodes)
    print("there are the identity nodes..")
    print(all_assert_nodes)

    all_cast_nodes = dynamic_graph.find_nodes_by_op("Cast")
    #graph.forward_inputs(all_identity_nodes)
    print("there are the Cast nodes..")
    print(all_cast_nodes)

    # Now create a new graph by collapsing namespaces
    dynamic_graph.collapse_namespaces(namespace_plugin_map)
    # Remove the outputs, so we just have a single output node (NMS).
    dynamic_graph.remove(dynamic_graph.graph_outputs, remove_exclusive_dependencies=False)

Hi,

I am having same issue of _Cast Operation.

Can you please tell me that…

  1. What Tensroflow-GPU version you are using? Do you train on Google Colab?
  2. Which hash commit of object detection API(/model/research) do you use to export inference graph?
  3. From where did you download SSD Mobilenet V2 model? Which config file do you use? What changes you made in config file ? (I have downloaded from http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz here.)
  4. What changes you made in config gile of Tensor RT from this repo ? https://github.com/AastaNV/TRT_object_detection

Can you please help.

Stuck from many days

Help highly appreciated.

Got the custom SSD_Mobilenet_v2_coco model working with detectnet (tensorRT) on Jetson TX2 using the Jetson-inference script.

It was messy however. The working Mobilenet_V2 Uff file was generated after tweaking code from TRT_object_detection library.

It worked fine on TRT_object_detection and generated the tensorRT engine with good inference result.

Also another step that I had to do was generate the mobilenet_V2 frozen_inference_graph.pb from an older model/research object-detection library. Used the commit from early 2018.
Using later ones appear to add nodes that conflict with Uff converter or tensorRT.

Hope that helps.

Result: About the frame rate, got over 50fps most of the time, rising up to 85fps.

This is strange because I was under the impression the my camera can only do up to 30fps, so maybe tensorRT was processing some frames 2 or 3 times.

Or was it deriving its values by inferences per second?

Change to inputOrder=[0, 2, 1] and numClasses= to your class number in the file
https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v2_coco_2018_03_29.py

Also used object detection model from early 2018 to convert to pb file. Thats is hash from the models git. ccb463174f17bab2626d9c292e2d82038f07e8a5

I’ve used the ccb463174f17bab2626d9c292e2d82038f07e8a5 commit to export my frozen graph, with TensorFlow 13.1, now after converting the pb to uff (with numclass and inputOrder already updated) got a new error :

[E] [TRT] Parameter check failed at: ../builder/Layers.h::setAxis::333, condition: axis >= 0
[E] [TRT] Concatenate/concat: all concat input tensors must have the same dimensions except on the concatenation axis
[E] [TRT] UffParser: Parser error: FeatureExtractor/MobilenetV2/Conv/BatchNorm/batchnorm_1/mul_1: The input to the Scale Layer is required to have a minimum of 3 dimensions.
[E] Failure while parsing UFF file

I’m really starting to loose hope…

I have the same problem

[TensorRT] ERROR: Parameter check failed at: ../builder/Layers.h::setAxis::333, condition: axis >= 0
[TensorRT] ERROR: Concatenate/concat: all concat input tensors must have the same dimensions except on the concatenation axis
[TensorRT] ERROR: UffParser: Parser error: BoxPredictor_0/ClassPredictor/BiasAdd: The input to the Scale Layer is required to have a minimum of 3 dimensions.