How to retrain ssd_inception_v2_coco_2017_11_17 from the tensorrt - samples

Hi,

I can generate a TensorRT engine with the .pb file you shared without error.
Would you mind to use the this GitHub and also the patch shared in #19 first?
https://github.com/AastaNV/TRT_object_detection

CAST is an operation for converting data type.
In general, you don’t need to write a plugin for it.
It should be supported by TensorRT automatically.

Thanks.

Hello AastaLLL,

I have been following this thread carefully as I too have fine-tuned a model using the ssd_mobilenet_v2_coco_2018_03_29 model as a base.

Here’s my scenario:

TensorRT: 6.0.1
Python: 3.6.8
Tensorflow-GPU: 1.14.0
CUDA: 10.0.130
cuDNN: 7

  1. Fine-tuned on 1 class (200K epochs) using Tensorflow’s object-detection API (latest git version).

  2. Exported my final training checkpoint using Tensorflow’s “export_inference_graphp.py” (passing input_shape=1,-1,-1,3 as an arg to “export_inference_graph.py”)

  3. Built “libflattenconcat.so” in /usr/src/tensorrt/samples/python/uff_ssd/build and pointed the ctypes.CDLL() in your main.py to this file:

ctypes.CDLL("/usr/src/tensorrt/samples/python/uff_ssd/build/libflattenconcat.so")
  1. Made the changes you have been recommending in #19

  2. Changed “node_manipulation.py” as recommended

When I run main.py from your GitHub I get the following error:

[TensorRT] ERROR: Could not register plugin creator:  FlattenConcat_TRT in namespace: 
WARNING:tensorflow:From /usr/lib/python3.6/dist-packages/graphsurgeon/StaticGraph.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING: To create TensorRT plugin nodes, please use the `create_plugin_node` function instead.
UFF Version 0.6.5
=== Automatically deduced input nodes ===
[name: "Input"
op: "Placeholder"
input: "Cast"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: 1
      }
      dim {
        size: 3
      }
      dim {
        size: 300
      }
      dim {
        size: 300
      }
    }
  }
}
]
=========================================

Using output node NMS
Converting to UFF graph
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
WARNING:tensorflow:From /usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:179: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_conf as custom op: FlattenConcat_TRT
Warning: No conversion function registered for layer: Cast yet.
Converting Cast as custom op: Cast
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_loc as custom op: FlattenConcat_TRT
Warning: No conversion function registered for layer: GridAnchor_TRT yet.
Converting GridAnchor as custom op: GridAnchor_TRT
DEBUG [/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['NMS'] as outputs
No. nodes: 554
UFF Output written to tmp.uff
[TensorRT] ERROR: UffParser: Validator error: Cast: Unsupported operation _Cast
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
  File "main.py", line 45, in <module>
    buf = engine.serialize()
AttributeError: 'NoneType' object has no attribute 'serialize'
  1. FlattenConcat_TRT seems to be a problem, and
  2. the TensorRT parser doesn’t recognize Cast

Any help on this will be super useful!

Thanks!

The model is working with the GitHub/python as you recommended but I am using c++ program as given in sampleUFFSSD with the plugin below !!

class FlattenConcat : public IPluginV2
{
public:
    FlattenConcat(int concatAxis, bool ignoreBatch)
        : mIgnoreBatch(ignoreBatch)
        , mConcatAxisID(concatAxis)
    {
        assert(mConcatAxisID == 1 || mConcatAxisID == 2 || mConcatAxisID == 3);
    }
    //clone constructor
    FlattenConcat(int concatAxis, bool ignoreBatch, int numInputs, int outputConcatAxis, int* inputConcatAxis)
        : mIgnoreBatch(ignoreBatch)
        , mConcatAxisID(concatAxis)
        , mOutputConcatAxis(outputConcatAxis)
        , mNumInputs(numInputs)
    {
        CHECK(cudaMallocHost((void**) &mInputConcatAxis, mNumInputs * sizeof(int)));
        for (int i = 0; i < mNumInputs; ++i)
            mInputConcatAxis[i] = inputConcatAxis[i];
    }

    FlattenConcat(const void* data, size_t length)
    {
        const char *d = reinterpret_cast<const char*>(data), *a = d;
        mIgnoreBatch = read<bool>(d);
        mConcatAxisID = read<int>(d);
        assert(mConcatAxisID == 1 || mConcatAxisID == 2 || mConcatAxisID == 3);
        mOutputConcatAxis = read<int>(d);
        mNumInputs = read<int>(d);
        CHECK(cudaMallocHost((void**) &mInputConcatAxis, mNumInputs * sizeof(int)));
        CHECK(cudaMallocHost((void**) &mCopySize, mNumInputs * sizeof(int)));

        std::for_each(mInputConcatAxis, mInputConcatAxis + mNumInputs, [&](int& inp) { inp = read<int>(d); });

        mCHW = read<nvinfer1::DimsCHW>(d);

        std::for_each(mCopySize, mCopySize + mNumInputs, [&](size_t& inp) { inp = read<size_t>(d); });

        assert(d == a + length);
    }
    ~FlattenConcat()
    {
        if (mInputConcatAxis)
            CHECK(cudaFreeHost(mInputConcatAxis));
        if (mCopySize)
            CHECK(cudaFreeHost(mCopySize));
    }
    int getNbOutputs() const override { return 1; }

    Dims getOutputDimensions(int index, const Dims* inputs, int nbInputDims) override
    {
        assert(nbInputDims >= 1);
        assert(index == 0);
        mNumInputs = nbInputDims;
        CHECK(cudaMallocHost((void**) &mInputConcatAxis, mNumInputs * sizeof(int)));
        mOutputConcatAxis = 0;
#ifdef SSD_INT8_DEBUG
        std::cout << " Concat nbInputs " << nbInputDims << "\n";
        std::cout << " Concat axis " << mConcatAxisID << "\n";
        for (int i = 0; i < 6; ++i)
            for (int j = 0; j < 3; ++j)
                std::cout << " Concat InputDims[" << i << "]"
                          << "d[" << j << " is " << inputs[i].d[j] << "\n";
#endif
        for (int i = 0; i < nbInputDims; ++i)
        {
            int flattenInput = 0;
            assert(inputs[i].nbDims == 3);
            if (mConcatAxisID != 1)
                assert(inputs[i].d[0] == inputs[0].d[0]);
            if (mConcatAxisID != 2)
                assert(inputs[i].d[1] == inputs[0].d[1]);
            if (mConcatAxisID != 3)
                assert(inputs[i].d[2] == inputs[0].d[2]);
            flattenInput = inputs[i].d[0] * inputs[i].d[1] * inputs[i].d[2];
            mInputConcatAxis[i] = flattenInput;
            mOutputConcatAxis += mInputConcatAxis[i];
        }

        return DimsCHW(mConcatAxisID == 1 ? mOutputConcatAxis : 1,
                       mConcatAxisID == 2 ? mOutputConcatAxis : 1,
                       mConcatAxisID == 3 ? mOutputConcatAxis : 1);
    }

    int initialize() override
    {
        CHECK(cublasCreate(&mCublas));
        return 0;
    }

    void terminate() override
    {
        CHECK(cublasDestroy(mCublas));
    }

    size_t getWorkspaceSize(int) const override { return 0; }

    int enqueue(int batchSize, const void* const* inputs, void** outputs, void*, cudaStream_t stream) override
    {
        int numConcats = 1;
        assert(mConcatAxisID != 0);
        numConcats = std::accumulate(mCHW.d, mCHW.d + mConcatAxisID - 1, 1, std::multiplies<int>());

        if (!mIgnoreBatch)
            numConcats *= batchSize;

        float* output = reinterpret_cast<float*>(outputs[0]);
        int offset = 0;
        for (int i = 0; i < mNumInputs; ++i)
        {
            const float* input = reinterpret_cast<const float*>(inputs[i]);
            float* inputTemp;
            CHECK(cudaMalloc(&inputTemp, mCopySize[i] * batchSize));

            CHECK(cudaMemcpyAsync(inputTemp, input, mCopySize[i] * batchSize, cudaMemcpyDeviceToDevice, stream));

            for (int n = 0; n < numConcats; ++n)
            {
                CHECK(cublasScopy(mCublas, mInputConcatAxis[i],
                                  inputTemp + n * mInputConcatAxis[i], 1,
                                  output + (n * mOutputConcatAxis + offset), 1));
            }
            CHECK(cudaFree(inputTemp));
            offset += mInputConcatAxis[i];
        }

        return 0;
    }

    size_t getSerializationSize() const override
    {
        return sizeof(bool) + sizeof(int) * (3 + mNumInputs) + sizeof(nvinfer1::Dims) + (sizeof(mCopySize) * mNumInputs);
    }

    void serialize(void* buffer) const override
    {
        char *d = reinterpret_cast<char*>(buffer), *a = d;
        write(d, mIgnoreBatch);
        write(d, mConcatAxisID);
        write(d, mOutputConcatAxis);
        write(d, mNumInputs);
        for (int i = 0; i < mNumInputs; ++i)
        {
            write(d, mInputConcatAxis[i]);
        }
        write(d, mCHW);
        for (int i = 0; i < mNumInputs; ++i)
        {
            write(d, mCopySize[i]);
        }
        assert(d == a + getSerializationSize());
    }

    void configureWithFormat(const Dims* inputs, int nbInputs, const Dims* outputDims, int nbOutputs, nvinfer1::DataType type, nvinfer1::PluginFormat format, int maxBatchSize) override
    {
        assert(nbOutputs == 1);
        mCHW = inputs[0];
        assert(inputs[0].nbDims == 3);
        CHECK(cudaMallocHost((void**) &mCopySize, nbInputs * sizeof(int)));
        for (int i = 0; i < nbInputs; ++i)
        {
            mCopySize[i] = inputs[i].d[0] * inputs[i].d[1] * inputs[i].d[2] * sizeof(float);
        }
    }

    bool supportsFormat(nvinfer1::DataType type, PluginFormat format) const override
    {
        return (type == nvinfer1::DataType::kFLOAT && format == PluginFormat::kNCHW);
    }
    const char* getPluginType() const override { return "FlattenConcat_TRT"; }

    const char* getPluginVersion() const override { return "1"; }

    void destroy() override { delete this; }

    IPluginV2* clone() const override
    {
        return new FlattenConcat(mConcatAxisID, mIgnoreBatch, mNumInputs, mOutputConcatAxis, mInputConcatAxis);
    }

    void setPluginNamespace(const char* libNamespace) override { mNamespace = libNamespace; }

    const char* getPluginNamespace() const override { return mNamespace.c_str(); }

private:
    template <typename T>
    void write(char*& buffer, const T& val) const
    {
        *reinterpret_cast<T*>(buffer) = val;
        buffer += sizeof(T);
    }

    template <typename T>
    T read(const char*& buffer)
    {
        T val = *reinterpret_cast<const T*>(buffer);
        buffer += sizeof(T);
        return val;
    }

    size_t* mCopySize = nullptr;
    bool mIgnoreBatch{false};
    int mConcatAxisID{0}, mOutputConcatAxis{0}, mNumInputs{0};
    int* mInputConcatAxis = nullptr;
    nvinfer1::Dims mCHW;
    cublasHandle_t mCublas;
    std::string mNamespace;
};

namespace
{
const char* FLATTENCONCAT_PLUGIN_VERSION{"1"};
const char* FLATTENCONCAT_PLUGIN_NAME{"FlattenConcat_TRT"};
} // namespace

class FlattenConcatPluginCreator : public IPluginCreator
{
public:
    FlattenConcatPluginCreator()
    {
        mPluginAttributes.emplace_back(PluginField("axis", nullptr, PluginFieldType::kINT32, 1));
        mPluginAttributes.emplace_back(PluginField("ignoreBatch", nullptr, PluginFieldType::kINT32, 1));

        mFC.nbFields = mPluginAttributes.size();
        mFC.fields = mPluginAttributes.data();
    }

    ~FlattenConcatPluginCreator() {}

    const char* getPluginName() const override { return FLATTENCONCAT_PLUGIN_NAME; }

    const char* getPluginVersion() const override { return FLATTENCONCAT_PLUGIN_VERSION; }

    const PluginFieldCollection* getFieldNames() override { return &mFC; }

    IPluginV2* createPlugin(const char* name, const PluginFieldCollection* fc) override
    {
        const PluginField* fields = fc->fields;
        for (int i = 0; i < fc->nbFields; ++i)
        {
            const char* attrName = fields[i].name;
            if (!strcmp(attrName, "axis"))
            {
                assert(fields[i].type == PluginFieldType::kINT32);
                mConcatAxisID = *(static_cast<const int*>(fields[i].data));
            }
            if (!strcmp(attrName, "ignoreBatch"))
            {
                assert(fields[i].type == PluginFieldType::kINT32);
                mIgnoreBatch = *(static_cast<const bool*>(fields[i].data));
            }
        }

        return new FlattenConcat(mConcatAxisID, mIgnoreBatch);
    }

    IPluginV2* deserializePlugin(const char* name, const void* serialData, size_t serialLength) override
    {

        //This object will be deleted when the network is destroyed, which will
        //call Concat::destroy()
        return new FlattenConcat(serialData, serialLength);
    }

    void setPluginNamespace(const char* libNamespace) override { mNamespace = libNamespace; }

    const char* getPluginNamespace() const override { return mNamespace.c_str(); }

private:
    static PluginFieldCollection mFC;
    bool mIgnoreBatch{false};
    int mConcatAxisID;
    static std::vector<PluginField> mPluginAttributes;
    std::string mNamespace = "";
};

PluginFieldCollection FlattenConcatPluginCreator::mFC{};
std::vector<PluginField> FlattenConcatPluginCreator::mPluginAttributes;

REGISTER_TENSORRT_PLUGIN(FlattenConcatPluginCreator);

and the error comes here as
Begin parsing model…
testssd: /home/nvidia/Desktop/ssd_trt/testssd.cpp:260: (line 8 in the above example ) FlattenConcat::FlattenConcat(int, bool): Assertion `mConcatAxisID == 1 || mConcatAxisID == 2 || mConcatAxisID == 3’ failed.
Aborted (core dumped)

Also, I have a trained a custom model of ssd_inception_v2_coco_2018_01_28 and the config file is not working even with python github repository shared by you !

  1. if the input order is [0,2,1], then the error I get is:
    python3: nmsPlugin.cpp:136: virtual void nvinfer1::plugin::DetectionOutput::configureWithFormat(const nvinfer1::Dims*, int, const nvinfer1::Dims*, int, nvinfer1::DataType, nvinfer1::PluginFormat, int): Assertion `numPriors * param.numClasses == inputDims[param.inputOrder[1]].d[0]’ failed.
    Aborted (core dumped)

  2. if the input order is [0,2,1]:
    python3: nmsPlugin.cpp:135: virtual void nvinfer1::plugin::DetectionOutput::configureWithFormat(const nvinfer1::Dims*, int, const nvinfer1::Dims*, int, nvinfer1::DataType, nvinfer1::PluginFormat, int): Assertion `numPriors * numLocClasses * 4 == inputDims[param.inputOrder[0]].d[0]’ failed.
    Aborted (core dumped)

Hello AastaLLL,
Last week, I update my Object detection API and I retrained the ssd with the same object and same model (ssd_inception_v2_coco_2017_11_17) and used python for inferece. Now I am getting the same error as pushkar.chatterji in the comments. I tried using all the config files from the config folder and modified them but same result.

I was able to get rid of the cast operation error by adding the line in the config file :

....   graph.collapse_namespaces(namespace_plugin_map)
    
    graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)
    graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input")
    graph.find_nodes_by_name("Input")[0].input.remove("Cast")

return graph

or by replacing ToFloat to Cast in namespace_plugin_map as:

namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": NMS,
        "Preprocessor": Input,
        "Cast": Input,
        "image_tensor": Input,
        "Concatenate": concat_priorbox,
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }

but after adding these line, i get the error:


[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/BoxEncodingPredictor/weights
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/BoxEncodingPredictor/Conv2D
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/BoxEncodingPredictor/biases
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/BoxEncodingPredictor/BiasAdd
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/Shape
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/strided_slice/stack
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/strided_slice/stack_1
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/strided_slice/stack_2
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/strided_slice
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/Reshape/shape/1
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/Reshape/shape/2
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/Reshape/shape/3
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/Reshape/shape
[TensorRT] INFO: UFFParser: parsing BoxPredictor_0/Reshape
[TensorRT] ERROR: UFFParser: Parser error: BoxPredictor_0/Reshape: Reshape: -1 dimension specified more than 1 time
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
File “main.py”, line 44, in
buf = engine.serialize()
AttributeError: ‘NoneType’ object has no attribute ‘serialize’

Could you help me, what is happening ???
I did make the changes suggested by you in the comment 19, and they work with model I shared with you ( trained with old object detection model and old tensorflow version), now after updating the Object detection API and TensorFlow version 1.14.0. the trained models is throwing the above error.

Hi, sandeepkumarjangir07

Suppose you should update the output layer name into MarkOutput_0 for the ssd_inception_v2_coco_2017_11_17.
Could you help to confirm this?

Ex.

import uff
import tensorflow as tf
import tensorrt as trt
from tensorrt.parsers import uffparser
import config

frozen_file = 'ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.pb'
quiet = False
list_nodes = False
text = False

uff_model = uff.from_tensorflow_frozen_model(frozen_file, output_nodes=['NMS'], preprocessor='config.py', quiet=quiet, list_nodes=list_nodes, text=text)

G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)

parser = uffparser.create_uff_parser()
parser.register_input("Placeholder", (3,300,300), 0)
parser.register_output("MarkOutput_0")

engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, 1, 1 << 20)

Thanks.

I think it is already updated. i think this is due to exporting the frozen graph using ttensorflow version 1.14 and latest object detection API as discussed in https://devtalk.nvidia.com/default/topic/1043557/tensorrt/error-uffparser-parser-error-boxpredictor_0-reshape-reshape-1-dimension-specified-more-than-1-/post/5307465/#5307465

I think I was able to solve the problem. The steps I followed :

  1. Trained a custom ssd models using the latest TensorFlow object Detection API and tensorflow version 1.14.0

  2. On another PC, I had another Tensorflow Object Detection API from early 2018, I used this to export my model from step 1 to frozen_graph. IMPORTANT : Before exporting into frozen graph, remove the below line from the pipeline.config file

override_base_feature_extractor_hyperparams: true

After removing the line, export it into frozen_graph.

  1. I used the config file shared by AastaLLLL to get the uff file and used it to create the engine and perform inference.

Resutls:
Network trained on 1 custom object.

The fps is counted as : Batch_size/ difference between time at which an image is given input to the network and the time at which the network outputs the prediction. It does not in consider the time to display the image with prediction.

With frames shown : Batch_size/ difference between time at which an image is given input to the network and the time at which a frame is displayed with the prediction.

Jetson TX2 : Jetpack 4.2
Batch size 1 = ~54 fps With frames shown : ~42 fps
Batch size 2 = ~111 fps With frames shown : ~83 fps

Jetson Nano : Jetpack 4.2
Batch size 1 = ~28 fps With frames shown : ~21 fps
Batch size 2 = ~56 fps With frames shown : ~43 fps

Thank you !

Glad to know issue resolved, thanks for the sharing!

I have same as sandeepkumarjangir0 when I convert my custom ssd-mobilenet-v1-fpn model:

env:
tensorlfow: 1.14.0
tensorRT: 6.0.1.5
google OD API: latest
cuda:10.0

error info as:

DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking [‘NMS’] as outputs
No. nodes: 998
UFF Output written to /media/york/F/GitHub/tensorflow/train_model/ssd_mobilenet_v1_fpn_shared_trafficlight/export_train_bdd100k_baidu_truck_zl004_class4_wh640640_depth1.0_level35_num1_focal_trainval299287_step320000_640640_320000/frozen_inference_graph.uff
UFF Text Output written to /media/york/F/GitHub/tensorflow/train_model/ssd_mobilenet_v1_fpn_shared_trafficlight/export_train_bdd100k_baidu_truck_zl004_class4_wh640640_depth1.0_level35_num1_focal_trainval299287_step320000_640640_320000/frozen_inference_graph.pbtxt
[TensorRT] ERROR: Could not register plugin creator: FlattenConcat_TRT in namespace:
TensorRT inference engine settings:

  • Inference precision - DataType.FLOAT
  • Max batch size - 1

[TensorRT] ERROR: UffParser: Validator error: MultiscaleGridAnchorGenerator/ToNormalizedCoordinates_2/Cast_1: Unsupported operation _Cast
[TensorRT] ERROR: Network must have at least one output
Building TensorRT engine. This may take few minutes.
Traceback (most recent call last):
File “/media/york/F/GitHub/tensorflow/models/research/uff_ssd-TensorRT-6.0.1.5/detect_objects_trafficlight.py”, line 250, in
main()
File “/media/york/F/GitHub/tensorflow/models/research/uff_ssd-TensorRT-6.0.1.5/detect_objects_trafficlight.py”, line 224, in main
batch_size=args.max_batch_size)
File “/media/york/F/GitHub/tensorflow/models/research/uff_ssd-TensorRT-6.0.1.5/utils/inference.py”, line 117, in init
engine_utils.save_engine(self.trt_engine, trt_engine_path)
File “/media/york/F/GitHub/tensorflow/models/research/uff_ssd-TensorRT-6.0.1.5/utils/engine.py”, line 132, in save_engine
buf = engine.serialize()
AttributeError: ‘NoneType’ object has no attribute ‘serialize’

Process finished with exit code 1

May someone help me???
problem:
1 FlattenConcat_TRT
2 Cast

why have to use old google od api to export .pb file??? my custom model is ssd-mobilenet-v1-fpn about traffic_lgiht recognization

in the step 2, after I remove the line from the pipeline.config file

override_base_feature_extractor_hyperparams: true

I can not export it into frozen_graph.

@sandeepkumarjangir07 if I remove that line from the config, TensorFlow fails to freeze the graph properly. Can you elaborate more?

May be you are using the latest version of TensorFlow object detection API. It also gave me error when I used version a version from mid 2018 onward. As I mentioned previously , I used Tensorflow Object Detection API from early 2018 to convert the checkpoints into frozen graph. I will try to upload the version that I used to convert the frozen graph on this thread.

Hello @sandeepkumarjangir07,
Could you please provide the versiosn that you used : Iwould like to use correct the tensorflow version and Tensorlfow object detection api version to train ssd inception v2 model and run inference using sampleUffSSD in tensorRT on jetson nano.