How to retrain ssd_inception_v2_coco_2017_11_17 from the tensorrt - samples

Hello,
I have the TensorFlow object detection API on my PC which I used to retain ssd mobilenet and other networks. After I was able to run video inference for ssd_inception_v2_coco_2017_11_17 using c++, i thought to retrain it of my custom objects like before. After training , I converted the checkpoint file to the frozen inference graph, copied it to the my jetson TX2 for converting it to the uff file. I am using the convert_to_uf.py and config.py to get the Uff file for the corresponding frozen graph.

When i am creating the engine I get the error in parsing the uff file as:

ERROR: Parameter check failed at: …/builder/Layers.h::setAxis::315, condition: axis>=0
ERROR: Concatenate/concat: all concat input tensors must have the same dimensions except on the concatenation axis
ERROR: UFFParser: Parser error: BoxPredictor_0/ClassPredictor/BiasAdd: The input to the Scale Layer is required to have a minimum of 3 dimensions.
ERROR: sample_uff_mnist: Fail to parse

I am using the ssd_inception_v2_coco.config config file from the Object detection API and my batchsize at training is 32. Except that I didnt make any changes in the ssd_inception_v2_coco.config file or the config.py file on my jetson.

Hi,

Would you mind to try if this config can work with your model?
https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_inception_v2_coco_2017_11_17.py

Please let us know the result.
Thanks.

Hello AastaLL,
So I had to change the config file because it didnt include the preprocess function that is needed by the convert_to_uff.py.

So the current files are
convert_to_uff.py:

#!/usr/bin/python
"""
convert_to_uff.py

Main script for doing uff conversions from
different frameworks.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import sys
import argparse
import uff
import os

def _replace_ext(path, ext):
    return os.path.splitext(path)[0] + ext

def process_cmdline_args():
    """
    Helper function for processing commandline arguments
    """
    parser = argparse.ArgumentParser(description="""Converts TensorFlow models to Unified Framework Format (UFF).""")

    parser.add_argument(
        "input_file",
        help="""path to input model (protobuf file of frozen GraphDef)""")

    parser.add_argument(
        '-l', '--list-nodes', action='store_true',
        help="""show list of nodes contained in input file""")

    parser.add_argument(
        '-t', '--text', action='store_true',
        help="""write a text version of the output in addition to the
        binary""")

    parser.add_argument(
        '--write_preprocessed', action='store_true',
        help="""write the preprocessed protobuf in addition to the
        binary""")

    parser.add_argument(
        '-q', '--quiet', action='store_true',
        help="""disable log messages""")

    parser.add_argument(
        '-d', '--debug', action='store_true',
        help="""Enables debug mode to provide helpful debugging output""")

    parser.add_argument(
        "-o", "--output",
        help="""name of output uff file""")

    parser.add_argument(
        "-O", "--output-node", default=[], action='append',
        help="""name of output nodes of the model""")

    parser.add_argument(
        '-I', '--input-node', default=[], action='append',
        help="""name of a node to replace with an input to the model.
        Must be specified as: "name,new_name,dtype,dim1,dim2,..."
        """)

    parser.add_argument(
        "-p", "--preprocessor",
        help="""the preprocessing file to run before handling the graph. This file must define a `preprocess` function that accepts a GraphSurgeon DynamicGraph as it's input. All transformations should happen in place on the graph, as return values are discarded""")

    args, _ = parser.parse_known_args()
    args.output = _replace_ext((args.output if args.output else args.input_file), ".uff")
    return args, _

def main():
    args, _ = process_cmdline_args()
    if not args.quiet:
        print("Loading", args.input_file)
    uff.from_tensorflow_frozen_model(
        args.input_file,
        output_nodes=args.output_node,
        preprocessor=args.preprocessor,
        input_node=args.input_node,
        quiet=args.quiet,
        text=args.text,
        list_nodes=args.list_nodes,
        output_filename=args.output,
        write_preprocessed=args.write_preprocessed,
        debug_mode=args.debug
    )

if __name__ == '__main__':
    main()

The modified config file :

import graphsurgeon as gs
import tensorflow as tf

Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        dtype=tf.float32,
        shape=[1, 3, 300, 300]
    )

PriorBox = gs.create_plugin_node(
name="GridAnchor",
op="GridAnchor_TRT",
minSize=0.2,
maxSize=0.95,
aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
variance=[0.1,0.1,0.2,0.2],
featureMapShapes=[19, 10, 5, 3, 2, 1],
numLayers=6
)

NMS = gs.create_plugin_node(
name="NMS",
op="NMS_TRT",
shareLocation=1,
varianceEncodedInTarget=0,
backgroundLabelId=0,
confidenceThreshold=1e-8,
nmsThreshold=0.6,
topK=100,
keepTopK=100,
numClasses=91,
inputOrder=[0, 2, 1],
confSigmoid=1,
isNormalized=1,
scoreConverter="SIGMOID"
)

concat_priorbox = gs.create_plugin_node(
"concat_priorbox",
op="ConcatV2",
dtype=tf.float32,
axis=2
)

concat_box_loc = gs.create_plugin_node(
"concat_box_loc",
op="FlattenConcat_TRT",
dtype=tf.float32
)

concat_box_conf = gs.create_plugin_node(
"concat_box_conf",
op="FlattenConcat_TRT",
dtype=tf.float32
)

namespace_plugin_map = {
"MultipleGridAnchorGenerator": PriorBox,
"Postprocessor": NMS,
"Preprocessor": Input,
"ToFloat": Input,
"image_tensor": Input,
"MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
"concat": concat_box_loc,
"concat_1": concat_box_conf
}

def preprocess(dynamic_graph):
    all_assert_nodes = dynamic_graph.find_nodes_by_op("Assert")
    dynamic_graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)
    all_identity_nodes = dynamic_graph.find_nodes_by_op("Identity")
    dynamic_graph.forward_inputs(all_identity_nodes)
    print(" Operation done ")
    dynamic_graph.collapse_namespaces(namespace_plugin_map)
    dynamic_graph.remove(dynamic_graph.graph_outputs, remove_exclusive_dependencies=False)

Then i run the command to get the UFF file :
python3 convert_to_uff.py --input-file frozen_inference_graph.pb -O NMS -p config.py

After that I use the UFF file to create the engine, which now gives a different error:
nvidia@nvidia:~/Desktop/test/build$ ./testssd
Using pipeline:
nvarguscamerasrc ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12, framerate=(fraction)60/1 ! nvvidconv flip-method=0 ! video/x-raw, width=(int)1280, height=(int)720, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected…
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 2592 x 1944 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 34000, max 550385000;

GST_ARGUS: 2592 x 1458 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 34000, max 550385000;

GST_ARGUS: 1280 x 720 FR = 120.000005 fps Duration = 8333333 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 22000, max 358733000;

GST_ARGUS: Running with following settings:
Camera index = 0
Camera mode = 2
Output Stream W = 1280 H = 720
seconds to Run = 0
Frame Rate = 120.000005
GST_ARGUS: PowerService: requested_clock_Hz=24192000
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
Hit ESC to exit
Hit ESC to exit

No saved model found , making a fresh engine
…/…/…/Desktop/test/inception.uff
Begin parsing model…
testssd: /home/nvidia/Desktop/test/testssd.cpp:254: FlattenConcat::FlattenConcat(int, bool): Assertion `mConcatAxisID == 1 || mConcatAxisID == 2 || mConcatAxisID == 3’ failed.
Aborted (core dumped)

Also, if you see the config file, i have added the " dtype = tf.float32" for Input and concat fields, this is because of I dont add them, I am getting the following error:

nvidia@nvidia:~/Desktop/ssd_inception_output$ python3 convert_to_uff.py --input-file frozen_inference_graph.pb -O NMS -p config_ori.py
Loading frozen_inference_graph.pb
WARNING:tensorflow:From /usr/lib/python3.6/dist-packages/uff/converters/tensorflow/conversion_helpers.py:185: FastGFile.init (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
Operation done
UFF Version 0.5.5
=== Automatically deduced input nodes ===
[name: “Input”
op: “Placeholder”
attr {
key: “shape”
value {
shape {
dim {
size: 1
}
dim {
size: 3
}
dim {
size: 300
}
dim {
size: 300
}
}
}
}
]

Using output node NMS
Converting to UFF graph
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Traceback (most recent call last):
File “/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py”, line 102, in convert_tf2numpy_dtype
return dtype.as_numpy_dtype
AttributeError: ‘int’ object has no attribute ‘as_numpy_dtype’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “convert_to_uff.py”, line 93, in
main()
File “convert_to_uff.py”, line 89, in main
debug_mode=args.debug
File “/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/conversion_helpers.py”, line 187, in from_tensorflow_frozen_model
return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
File “/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/conversion_helpers.py”, line 157, in from_tensorflow
debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py”, line 94, in convert_tf2uff_graph
uff_graph, input_replacements, debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py”, line 79, in convert_tf2uff_node
op, name, tf_node, inputs, uff_graph, tf_nodes=tf_nodes, debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py”, line 47, in convert_layer
return cls.registry_[op](name, tf_node, inputs, uff_graph, **kwargs)
File “/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter_functions.py”, line 19, in convert_placeholder
dtype = tf2uff.convert_tf2numpy_dtype(tf_node.attr[‘dtype’].type)
File “/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py”, line 113, in convert_tf2numpy_dtype
return np.dtype(dt[dtype])
TypeError: data type “invalid” not understood

I think my retrained model is exactly the same as the pre-trained model except the fact that after I convert the pre-trained model, it shows " No. nodes: 563".
And when I convert my re-trained model it shows " No. nodes: 781"
I dont know if the not being able to parse the UFF engine has to do with the number of nodes becasue its giving the axis error.

Another side issue is, whenever I run my program with the Gstreamer pipeline, I runs only once and to run the program again I need to restart my jetson because it gives the following error after the first time of the run:

nvidia@nvidia:~/Desktop/test/build$ ./testssd
Using pipeline:
nvarguscamerasrc ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12, framerate=(fraction)60/1 ! nvvidconv flip-method=0 ! video/x-raw, width=(int)1280, height=(int)720, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink
Error generated. /dvs/git/dirty/git-master_linux/multimedia/nvgstreamer/gst-nvarguscamera/gstnvarguscamerasrc.cpp, execute:532 Failed to create CaptureSession

Hi,

Guess that some layer name may updated in the newer TensorFlow version.
Would you mind to share your model with us so we can check it directly?

For GStreamer, are you using a CSI camera since the command is an argus pipeline.
Thanks.

Hey,
I am attaching all the files from my project. Hope they will help. I am also trying to look into the layers in the mean time and if i find anything, i will update the post.

Also, I am using the onboard CSI camera from my jetson tx2 and also i am running this code on my jetson nano which also uses the CSI camera.

Model : https://we.tl/t-uRurY10xtY

Hi,

We found there are some layer name/format update in TensorFlow from 2017 to 2018.
Would you mind to give this config a try:
https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v1_coco_2018_01_28.py

Thanks.

As per your suggestion, i changed the file accordingly

import graphsurgeon as gs
import tensorflow as tf

Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        dtype=tf.float32,
        shape=[1, 3, 300, 300]
    )

PriorBox = gs.create_plugin_node(
name="MultipleGridAnchorGenerator",
op="GridAnchor_TRT",
minSize=0.2,
maxSize=0.95,
aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
variance=[0.1,0.1,0.2,0.2],
featureMapShapes=[19, 10, 5, 3, 2, 1],
numLayers=6
)

NMS = gs.create_plugin_node(
name="Postprocessor",
op="NMS_TRT",
shareLocation=1,
varianceEncodedInTarget=0,
backgroundLabelId=0,
confidenceThreshold=1e-8,
nmsThreshold=0.6,
topK=100,
keepTopK=100,
numClasses=91,
inputOrder=[0, 2, 1],
confSigmoid=1,
isNormalized=1
)

concat_priorbox = gs.create_plugin_node(
"concat_priorbox",
op="ConcatV2",
dtype=tf.float32,
axis=2
)

concat_box_loc = gs.create_plugin_node(
"concat_box_loc",
op="FlattenConcat_TRT",
dtype=tf.float32
)

concat_box_conf = gs.create_plugin_node(
"concat_box_conf",
op="FlattenConcat_TRT",
dtype=tf.float32
)

namespace_plugin_map = {
"MultipleGridAnchorGenerator": PriorBox,
"Postprocessor": Postprocessor,
"Preprocessor": Input,
"ToFloat": Input,
"image_tensor": Input,
"MultipleGridAnchorGenerator/Concatenate": concat_priorbox,
"concat": concat_box_loc,
"concat_1": concat_box_conf
}

def preprocess(dynamic_graph):
    all_assert_nodes = dynamic_graph.find_nodes_by_op("Assert")
    dynamic_graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)
    all_identity_nodes = dynamic_graph.find_nodes_by_op("Identity")
    dynamic_graph.forward_inputs(all_identity_nodes)
    print(" Operation done ")
    dynamic_graph.collapse_namespaces(namespace_plugin_map)
    dynamic_graph.remove(dynamic_graph.graph_outputs, remove_exclusive_dependencies=False)
    dynamic_graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input")
    dynamic_graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")

The output is :

python3 convert_to_uff.py tensorflow --input-file frozen_inference_graph.pb -O NMS -p config.py
Loading frozen_inference_graph.pb
WARNING:tensorflow:From /usr/lib/python3.5/dist-packages/uff/converters/tensorflow/conversion_helpers.py:185: FastGFile.init (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
Traceback (most recent call last):
File “convert_to_uff.py”, line 111, in
main()
File “convert_to_uff.py”, line 106, in main
output_filename=args.output
File “/usr/lib/python3.5/dist-packages/uff/converters/tensorflow/conversion_helpers.py”, line 187, in from_tensorflow_frozen_model
return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
File “/usr/lib/python3.5/dist-packages/uff/converters/tensorflow/conversion_helpers.py”, line 84, in from_tensorflow
pre = importlib.import_module(os.path.splitext(os.path.basename(preprocessor))[0])
File “/usr/lib/python3.5/importlib/init.py”, line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 986, in _gcd_import
File “”, line 969, in _find_and_load
File “”, line 958, in _find_and_load_unlocked
File “”, line 673, in _load_unlocked
File “”, line 665, in exec_module
File “”, line 222, in _call_with_frames_removed
File “/home/amdc/Desktop/ssd_in_tank/config.py”, line 59, in
“Postprocessor”: Postprocessor,
NameError: name ‘Postprocessor’ is not defined

I think that this config file is wrong. There is no image_tensor:0 in the graph. apart from it is giving error at Postprocessor.

Also the lines are returning empty lists:

all_assert_nodes = dynamic_graph.find_nodes_by_op("assert")
dynamic_graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)
all_identity_nodes = dynamic_graph.find_nodes_by_op("Identity")
dynamic_graph.forward_inputs(all_identity_nodes)

So the result is the same whether these are lines are in the config.py code or not !!

Hi,

I cannot open the model link:

Model : https://we.tl/t-uRurY10xtY

Would you mind to check it?

Thanks.

Hello AastaLL, sorry for such a late reply. I will attach the model again !! please have a look at it ! I just resumed the project and I am still having the same problem !!

best regards,
Sandeep Kumar Jangir

Hi,

We are able to access your model.
Will update more information with you later.

Thanks.

Hello AastaLL,
Thank you very much. I will wait for your response.

Regards

Hi,

Sorry for keeping you waiting.

I have tested your model and it can be converted into the TensorRT engine correctly.
Just use the following configure file and update the correct class number of your use case:
https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v2_coco_2018_03_29.py

diff --git a/config/model_ssd_mobilenet_v2_coco_2018_03_29.py b/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
index 3c9f3b8..d1becbe 100644
--- a/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
+++ b/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
@@ -40,7 +40,7 @@ def add_plugin(graph):
         nmsThreshold=0.6,
         topK=100,
         keepTopK=100,
-        numClasses=91,
+        numClasses=N, <- update this
         inputOrder=[1, 0, 2],
         confSigmoid=1,
         isNormalized=1

This is because the architecture of these models are similar but the API changed between 2017 and 2018.
So you will need to preprocess the model with the 2018 type. (since you have retrained the model)

Thanks.

.

Hello AastaLLL,
Thank you very much for the reply. I was expecting a similar response as the API keeps changing. I had already made the changes before and I am stuck at creating the plugin. I had opened a thread regarding the same https://devtalk.nvidia.com/default/topic/1058922/jetson-tx2/creating-plugnins-for-tensorrt-operation-_cast-/post/5372915/#5372915. I am still waiting for the response and I hope someone would help me :)

Best regards,
Sandeep Kumar Jangir

Hi,

May I know the class number of your retrain model?
I can check this for you if with the information.

Thansk.

Hi AastaLLL,

I would like to ask you for some help. I have been struggling trying to create the TensorRt engine from a custom trained model of ssd_mobilenet_v2_coco_2018_03_29. I have been able to create the engine with the pretrained model, however, when i tried the custom one it launched the next error.

libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/aarch64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_): 
Traceback (most recent call last):
  File "main.py", line 40, in <module>
    parser.parse('tmp.uff', network)
RuntimeError: CHECK failed: (index) < (current_size_):

I am trying to create the engine changing the next file https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v2_coco_2018_03_29.py. Nonetheless, after modify the " numClasses=N " parameter, i have been unable to succeed. the modifications are listed below.

path = 'model/customModel/frozen_inference_graph.pb' #model based on sd_mobilenet_v2_coco_2018_03_29
TRTbin = 'TRT_customModel.bin'
numClasses=37,

Additionally, i tried to create the model with a previous version of the object detection api as mentiones here by baramuse https://devtalk.nvidia.com/default/topic/1058639/mobilenet_v2-sampleuffssd-not-working-help-please-/#reply. I have used the git hash ccb463174f17bab2626d9c292e2d82038f07e8a5 but without success.

Any help will be appreciate, best regards.

Hello AastaLLL, i am retraining on only one class (Class == 1) but after successfully able to do inference using tensorrt, I am going to extend it !!

Right now I am going to different blogs and forums to find a way to write a plugin myself, but they are not very clear !! hopefully you could help me !!

Thank you !!

Hi,

Sorry for the late reply. Hope this still helps now.
You will need to update the parameter in NMS.

diff --git a/config/model_ssd_mobilenet_v2_coco_2018_03_29.py b/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
index 3c9f3b8..6fb9e2a 100644
--- a/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
+++ b/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
@@ -1,6 +1,6 @@
 import graphsurgeon as gs
 
-path = 'model/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb'
+path = 'model/frozen_inference_graph.pb'
 TRTbin = 'TRT_ssd_mobilenet_v2_coco_2018_03_29.bin'
 output_name = ['NMS']
 dims = [3,300,300]
@@ -40,8 +40,8 @@ def add_plugin(graph):
         nmsThreshold=0.6,
         topK=100,
         keepTopK=100,
-        numClasses=91,
-        inputOrder=[1, 0, 2],
+        numClasses=2,
+        inputOrder=[0, 2, 1],
         confSigmoid=1,
         isNormalized=1
     )

You model can run successfully with the config file shared above.
Thanks.

Hello AastaLLL,
The output i get is :

No saved model found , making a fresh engine
…/…/…/Desktop/ssd_trt/incep_update.uff
Begin parsing model…
testssd: /home/nvidia/Desktop/ssd_trt/testssd.cpp:260: FlattenConcat::FlattenConcat(int, bool): Assertion `mConcatAxisID == 1 || mConcatAxisID == 2 || mConcatAxisID == 3’ failed.
Aborted (core dumped)

As far as i could observe the changes in the config filE is the number of classes and reordering of the input-order. THEW MAIN PROBLEM IS THE CAST OPERATION. HOW TO IMPLEMENT THE CAST OPERATION PLUGIN ??