Writing layer for NonMaxSuppression in onnx parser

roshanchaudhari · April 24, 2020, 10:23pm

I am working on writing a layer in onnx parser for NonMaxSuppression op. For this, I am adding DEFINE_BUILTIN_OP_IMPORTER in builtin_op_importers.cpp from onnx-tensorrt backend.
Tensorrt has BatchedNMS plugin for this op. However, the input and output params mentioned in: TensorRT/plugin/batchedNMSPlugin at master · NVIDIA/TensorRT · GitHub tensorrt plugin
and onnx parser op: onnx/Operators.md at main · onnx/onnx · GitHub does not match.

shayNV · April 28, 2020, 6:26am

Hi @roshanchaudhari,
can you please clarify what is your question?

roshanchaudhari · April 30, 2020, 9:29pm

Hi @shayNV thanks for responding. As I mentioned above, there is no exact matching existing plugin/layer for NonMaxSuppression op. So it seems that the only option is modifying BatchedNMS_TRT plugin to return the indices of the boxes to match the output: onnx/Operators.md at main · onnx/onnx · GitHub ? Or is there any other way?

So I assumed there is no other way to do this and tried writing a layer on onnx-tensorrt backend. In builtin_ops_importer.cpp I wrote a importer:

DEFINE_BUILTIN_OP_IMPORTER(NonMaxSuppression)
{
 // NonMaxSuppression is not supported opset below 10.
 ASSERT(ctx->getOpsetVersion() >= 10, ErrorCode::kUNSUPPORTED_NODE);

 nvinfer1::ITensor* boxes_tensor = &convertToTensor(inputs.at(0), ctx);
 nvinfer1::ITensor* scores_tensor = &convertToTensor(inputs.at(1), ctx);
 const int numInputs = inputs.size();
 LOG_ERROR("no of inputs are "<<numInputs);
 LOG_ERROR("node outsize and op type are "<<node.output().size()<< " type " << node.op_type());

const auto scores_dims = scores_tensor->getDimensions();
 const auto boxes_dims = boxes_tensor->getDimensions();
 LOG_ERROR("boxes dims "<< boxes_dims.nbDims << " dim3 has size "<<boxes_dims.d[2]);
    const std::string pluginName = "BatchedNMS_TRT";
 const std::string pluginVersion = "1";
 std::vector<nvinfer1::PluginField> f;

 bool share_location = true;
 const bool is_normalized = true;
 const bool clip_boxes = true;
 int backgroundLabelId = 0;
// Initialize.
 f.emplace_back("shareLocation", &share_location, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("isNormalized", &is_normalized, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("clipBoxes", &clip_boxes, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("backgroundLabelId", &backgroundLabelId, nvinfer1::PluginFieldType::kINT32, 1);
 // Create plugin from registry
 nvinfer1::IPluginV2* plugin = importPluginFromRegistry(ctx, pluginName, pluginVersion, node.name(), f);

 ASSERT(plugin != nullptr && "NonMaxSuppression plugin was not found in the plugin registry!",
     ErrorCode::kUNSUPPORTED_NODE);

 std::vector<nvinfer1::ITensor*> nms_inputs ={boxes_tensor, scores_tensor};
 RETURN_FIRST_OUTPUT(ctx->network()->addPluginV2(nms_inputs.data(), nms_inputs.size(), *plugin));
}

However, when I try to run the above code, it crashes at:

nvinfer1::plugin::BatchedNMSPlugin::getOutputDimensions() where it fails for the ASSERT(inputs[0].nbDims == 3); however, in DEFINE_BUILTIN_OP_IMPORTER(NonMaxSuppression) in my above function, it prints inputs[0].nbDims = 3. Why assertion fails in getOutputDimensions().?

I tried to trace it but the call is coming from libinfer runtime library:

#0  nvinfer1::plugin::BatchedNMSPlugin::getOutputDimensions (this=0x5555654b44c0, index=2, inputs=0x5555654b4800, nbInputDims=2)
at trt_src/TensorRT/TensorRT/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp:70
#1  0x00007fffe9e735fd in nvinfer1::PluginV2Layer::getOutputForm(int, std::vector<nvinfer1::TensorForm, std::allocator<nvinfer1::TensorForm> > const&) const ()
   from /usr/lib/x86_64-linux-gnu/libnvinfer.so.7
#2  0x00007fffe9f277ef in nvinfer1::Network::updateTensor(nvinfer1::NetworkTensor const*) const () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.7
#3  0x00007fffe9f27d0a in nvinfer1::NetworkTensor::getDimensions() const () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.7
#4  0x00005555555a3bd2 in onnx2trt::TensorOrWeights::shape (this=0x5555654b22c0) at /onnx-tensorrt/TensorOrWeights.hpp:96
#5  onnx2trt::parseGraph (ctx=ctx@entry=0x55555628e110, graph=..., deserializingINetwork=<optimized out>, currentNode=currentNode@entry=0x55555628e3c0)
at ModelImporter.cpp:187
#6  0x00005555555a6a7f in onnx2trt::ModelImporter::importModel (this=0x55555628e0d0, model=..., weight_count=<optimized out>, weight_descriptors=<optimized out>)
at ModelImporter.cpp:521

roshanchaudhari · May 1, 2020, 8:02pm

upon further debug I found that dimensions are not correct insidegetOutputDimensions(), if I create plugin with wildcard dimensions and then check the values in getOutputDimensions then they are correct. However, my input tensor does not have any wildcard dimensions.

shayNV · May 11, 2020, 6:00am

Hi @roshanchaudhari,
Can you clear up your comment, did you solve the issue you were facing? or do you have another question?

thanks

roshanchaudhari · May 11, 2020, 5:21pm

Yeah, I did find workaround. Lets close this one.

roshanchaudhari · May 11, 2020, 8:51pm

However, I still have few questions about tensor-rt:

If I am writing a new layer plugin, is it necessary to define extent of each dimension for the output if I am deriving it from IPluginV2DynamicExt ?
For example If I am writing plugin for some op, I cannot define dimension even in terms of input dims, is that okay to use -1 constant for dimension ?
How is output type decided if writing a plugin and using existing layer?

SivaRamaKrishnaNV · May 14, 2020, 2:25pm

Dear @roshanchaudhari
I cannot define dimension even in terms of input dims, is that okay to use -1 constant for dimension ?

Does that mean you want to add -1 to plugin’s input buffers dims?

How is output type decided if writing a plugin and using existing layer?

You mean how to write getOutputDataType() for your plugin? what do you mean by using existing layer? Can you elaborate ?

roshanchaudhari · May 14, 2020, 5:25pm

#1. Please see below snippet.

DimsExprs MyPlugin::getOutputDimensions(
    int outputIndex, const nvinfer1::DimsExprs* inputs, int nbInputs, nvinfer1::IExprBuilder& exprBuilder)
{
  nvinfer1::DimsExprs output;
  output.nbDims = 2;
  output.d[0] = exprBuilder.constant(inputs[0].nbDims);
  
  //  Is this allowed ? If I cannot define extent of any of output dims. 
  output.d[1] = exprBuilder.constant(-1);
   return output;
}

Solved.

SivaRamaKrishnaNV · June 1, 2020, 6:38am

Dear @roshanchaudhari,
Is the issue solved?

roshanchaudhari · June 1, 2020, 5:12pm

No. I am still waiting for the answer.

SivaRamaKrishnaNV · June 1, 2020, 7:56pm

Dear @roshanchaudhari,
It is not allowed. The output dimensions must be computable from the input dimensions and parameters within the layer. Dimensions that depend on tensor data at runtime are not allowed.

may I know why you are looking for this use case?

roshanchaudhari · June 1, 2020, 9:01pm

For use case, if we want to write a plugin for NonZero op which basically returns indices of non zero elements from the tensor. In this case we cannot define output dimensions.

SivaRamaKrishnaNV · June 3, 2020, 3:22am

Dear @roshanchaudhari,
TRT has two face execution model. First phase is on CPU, computes values of shape tensors. Second phase is streamed on GPU and computed execution shape tensor. Information can only flow from phase 1 to 2.

One work around is that, the buffersize can be stored in buffer[0] or having two outputs one indicates buffer cound and other is to store nonzero indices. But the problem is none of
TRT dimensioning infrastructure can read buffer count from GPU. So you had to end the engine after this operation or have all layers as plugin layers which also have dimensions stored on GPU. Generally this wont be the case for a DNN. Please file a bug for this use case, our engineering team will prioritize it.

bnascimento · October 27, 2020, 4:20pm

roshanchaudhari:

DEFINE_BUILTIN_OP_IMPORTER(NonMaxSuppression)
{
 // NonMaxSuppression is not supported opset below 10.
 ASSERT(ctx->getOpsetVersion() >= 10, ErrorCode::kUNSUPPORTED_NODE);

 nvinfer1::ITensor* boxes_tensor = &convertToTensor(inputs.at(0), ctx);
 nvinfer1::ITensor* scores_tensor = &convertToTensor(inputs.at(1), ctx);
 const int numInputs = inputs.size();
 LOG_ERROR("no of inputs are "<<numInputs);
 LOG_ERROR("node outsize and op type are "<<node.output().size()<< " type " << node.op_type());

const auto scores_dims = scores_tensor->getDimensions();
 const auto boxes_dims = boxes_tensor->getDimensions();
 LOG_ERROR("boxes dims "<< boxes_dims.nbDims << " dim3 has size "<<boxes_dims.d[2]);
    const std::string pluginName = "BatchedNMS_TRT";
 const std::string pluginVersion = "1";
 std::vector<nvinfer1::PluginField> f;

 bool share_location = true;
 const bool is_normalized = true;
 const bool clip_boxes = true;
 int backgroundLabelId = 0;
// Initialize.
 f.emplace_back("shareLocation", &share_location, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("isNormalized", &is_normalized, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("clipBoxes", &clip_boxes, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("backgroundLabelId", &backgroundLabelId, nvinfer1::PluginFieldType::kINT32, 1);
 // Create plugin from registry
 nvinfer1::IPluginV2* plugin = importPluginFromRegistry(ctx, pluginName, pluginVersion, node.name(), f);

 ASSERT(plugin != nullptr && "NonMaxSuppression plugin was not found in the plugin registry!",
     ErrorCode::kUNSUPPORTED_NODE);

 std::vector<nvinfer1::ITensor*> nms_inputs ={boxes_tensor, scores_tensor};
 RETURN_FIRST_OUTPUT(ctx->network()->addPluginV2(nms_inputs.data(), nms_inputs.size(), *plugin));
}

Hi everyone, I’ve been following this thread because I have the same issue. Basically need to register a NonMaximumSupression operation on onnx-tensorrt. So I wrote as adviced by @roshanchaudhari on builtin_op_importers.cpp the following:

DEFINE_BUILTIN_OP_IMPORTER(NonMaxSuppression)
{
        // NonMaxSuppression is not supported opset below 10.
        ASSERT(ctx->getOpsetVersion() >= 10, ErrorCode::kUNSUPPORTED_NODE);

        nvinfer1::ITensor* boxes_tensor = &convertToTensor(inputs.at(0), ctx);
        nvinfer1::ITensor* scores_tensor = &convertToTensor(inputs.at(1), ctx);
        const int numInputs = inputs.size();
        LOG_ERROR("no of inputs are "<<numInputs);
        LOG_ERROR("node outsize and op type are "<<node.output().size()<< " type " << node.op_type());

        const auto scores_dims = scores_tensor->getDimensions();
        const auto boxes_dims = boxes_tensor->getDimensions();
        LOG_ERROR("boxes dims "<< boxes_dims.nbDims << " dim3 has size "<<boxes_dims.d[2]);
        const std::string pluginName = "BatchedNMS_TRT";
        const std::string pluginVersion = "1";
        std::vector<nvinfer1::PluginField> f;

        bool share_location = true;
        const bool is_normalized = true;
        const bool clip_boxes = true;
        int backgroundLabelId = 0;
        // Initialize.
        f.emplace_back("shareLocation", &share_location, nvinfer1::PluginFieldType::kINT8, 1);
        f.emplace_back("isNormalized", &is_normalized, nvinfer1::PluginFieldType::kINT8, 1);
        f.emplace_back("clipBoxes", &clip_boxes, nvinfer1::PluginFieldType::kINT8, 1);
        f.emplace_back("backgroundLabelId", &backgroundLabelId, nvinfer1::PluginFieldType::kINT32, 1);
        // Create plugin from registry
        // nvinfer1::IPluginV2* plugin = importPluginFromRegistry(ctx, pluginName, pluginVersion, node.name(), f);
        nvinfer1::IPluginV2* plugin = createPlugin(node.name(), importPluginCreator(pluginName, pluginVersion), f);

        ASSERT(plugin != nullptr && "NonMaxSuppression plugin was not found in the plugin registry!",
                   ErrorCode::kUNSUPPORTED_NODE);

        std::vector<nvinfer1::ITensor*> nms_inputs ={boxes_tensor, scores_tensor};
        RETURN_FIRST_OUTPUT(ctx->network()->addPluginV2(nms_inputs.data(), nms_inputs.size(), *plugin));
}

However I got the following error:

[2020-10-27 16:19:27   ERROR] /opt/onnx-tensorrt/builtin_op_importers.cpp:122: no of inputs are 5
[2020-10-27 16:19:27   ERROR] /opt/onnx-tensorrt/builtin_op_importers.cpp:123: node outsize and op type are 1 type NonMaxSuppression
[2020-10-27 16:19:27   ERROR] /opt/onnx-tensorrt/builtin_op_importers.cpp:127: boxes dims 3 dim3 has size 4
#assertion/home/jenkins/workspace/OSS/L0_MergeRequest/oss/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,77
Aborted (core dumped)

bnascimento · October 27, 2020, 4:21pm

Anyone has a solid advice to solve this issue?

Sneaky_Turtle · January 9, 2021, 12:25am

Did you ever solve this issue? I am trying to deploy with TensorRT, and it is looking for a Non Max Suppresion plugin for TensorFlow 2 SSD model retrained with their API.

klinten · January 11, 2021, 7:59am

I found a working solution.
The documentation fails to mention that BatchedNMSPlugin is modeled directly after TensorFlow CombinedNonMaxSuppression:

as compared to

github.com

NVIDIA/TensorRT/blob/master/plugin/batchedNMSPlugin/README.md

# batchedNMSPlugin

**Table Of Contents**
- [Description](#description)
    * [Structure](#structure)
- [Parameters](#parameters)
- [Algorithms](#algorithms)
- [Additional resources](#additional-resources)
- [License](#license)
- [Changelog](#changelog)
- [Known issues](#known-issues)

## Description

The `batchedNMSPlugin` implements a non-maximum suppression (NMS) step over boxes for object detection networks.

Non-maximum suppression is typically the universal step in object detection inference. This plugin is used after you’ve processed the bounding box prediction and object classification to get the final bounding boxes for objects.
  
With this plugin, you can incorporate the non-maximum suppression step during TensorRT inference. During inference, the neural network generates a fixed number of bounding boxes with box coordinates, identified class and confidence levels. Not all bounding boxes, but the most representative ones, have to be drawn on the original image.

This file has been truncated. show original

So I modified my TF model to use CombinedNMS, then wrote a script using ONNX Graphsurgeon that convert nodes from CombinedNonMaxSuppression to BatchedNMSDynamic_TRT based on the the following mapping from the tf2tensorrt code:

github.com

tensorflow/tensorflow/blob/master/tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc#L5775


      
            return errors::InvalidArgument(
                "Node ", node_name,
                " with is neither Placeholder nor Arg, instead ", node_def.op());
          }
          DataType tf_dtype = node_def.attr().at(type_key).type();
          if (tf_dtype == DT_RESOURCE) {
            VLOG(2) << "Adding engine input resource " << node_name;
            TF_RETURN_IF_ERROR(converter->AddInputResource(
                node_name, ctx->input(slot_number).flat<ResourceHandle>()(0)));
          } else {
            nvinfer1::DataType trt_dtype;
            nvinfer1::Dims trt_dims;
            int batch_size = -1;
            const auto shape = input_shapes.at(slot_number);
            const auto status = ValidateTensorProperties(
                node_def.op(), node_def.attr().at(type_key).type(), shape,
                use_implicit_batch, /*validation_only=*/false, &trt_dtype,
                &trt_dims, &batch_size);
            if (!status.ok()) {
              const string error_message =
                  StrCat("Validation failed for ", node_name, " and input slot ",

Sneaky_Turtle · January 12, 2021, 3:30am

@klinten

Awesome to see someone has found a working solution. There are a lot of posts on github and these forums of people looking to figure out how to do that.

If I understand correctly then the batchedNonMaxSuppression layer of almost every TensorFlow2 model has a solution with what you have shared?

Would you share what you did in GraphSurgeon with details as to your steps? I myself am still learning how to deploy my first model and I have not been able to figure this out while working on it hours upon hours every day.

I can send you a link to my models if you don’t mind taking a look.

klinten · January 12, 2021, 7:03am

Well, if you can do your NMS with CombinedNMS, then you can use TensorRT with BatchedNMS, and I think a lot of people should be ablet to do that. Though I really think Nvidia should just go ahead and implement the NMS specified in ONNX once and for all.

Unfortunately I’m not allowed to share code that I write for work, so you’ll have to reinvent the wheel yourself here, but one thing that I forgot to mention is that you need to make sure that tf2onnx doesn’t convert your CombinedNMS. For that, pass the flag --custom-ops CombinedNonMaxSuppression. Then, load the network in onnx_graphsurgeon and iterate over all CombinedNonMaxSuppression nodes. For every node you replace it with a BatchedNMS_TRT node according to the recipe I linked to from the TensorFlow code.

Topic		Replies	Views
Failed to used TensorRT Engine file in deepstream DeepStream SDK	16	2760	October 12, 2021
ONNX Plugin Layer implements TensorRT	11	1913	January 12, 2021
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2049	November 29, 2022
Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size) TensorRT	10	1293	October 12, 2021
ONNX to TensorRT Python module doesn't generate dynamic batch size engine TensorRT tensorrt , cudnn , onnx	3	1074	October 20, 2023
Errors with reading pb file in TensorRT and readNetFromTensorflow in C++ TensorRT	3	1238	January 26, 2021
Floor - Cast - Resize(or Slice) cause internal error TensorRT tensorrt	7	2013	January 12, 2022
DeepStream, Tensorflow Model Zoo - Incompatibility DeepStream SDK	13	1495	October 12, 2021
Onnx to TensorRT, with NVIDIA plugins (gridAnchor) TensorRT	11	2110	November 16, 2021
Custom layer (PlugIn) - TensorRTC++ nvuffparser::IUffParser Vs. TensorRT C++ nvonnxparser::IParser TensorRT	8	2940	January 30, 2020

Writing layer for NonMaxSuppression in onnx parser

Related topics