Writing layer for NonMaxSuppression in onnx parser

I am working on writing a layer in onnx parser for NonMaxSuppression op. For this, I am adding DEFINE_BUILTIN_OP_IMPORTER in builtin_op_importers.cpp from onnx-tensorrt backend.
Tensorrt has BatchedNMS plugin for this op. However, the input and output params mentioned in: TensorRT/plugin/batchedNMSPlugin at master · NVIDIA/TensorRT · GitHub tensorrt plugin
and onnx parser op: onnx/Operators.md at main · onnx/onnx · GitHub does not match.

1 Like

Hi @roshanchaudhari,
can you please clarify what is your question?

Hi @shayNV thanks for responding. As I mentioned above, there is no exact matching existing plugin/layer for NonMaxSuppression op. So it seems that the only option is modifying BatchedNMS_TRT plugin to return the indices of the boxes to match the output: onnx/Operators.md at main · onnx/onnx · GitHub ? Or is there any other way?

So I assumed there is no other way to do this and tried writing a layer on onnx-tensorrt backend. In builtin_ops_importer.cpp I wrote a importer:

DEFINE_BUILTIN_OP_IMPORTER(NonMaxSuppression)
{
 // NonMaxSuppression is not supported opset below 10.
 ASSERT(ctx->getOpsetVersion() >= 10, ErrorCode::kUNSUPPORTED_NODE);

 nvinfer1::ITensor* boxes_tensor = &convertToTensor(inputs.at(0), ctx);
 nvinfer1::ITensor* scores_tensor = &convertToTensor(inputs.at(1), ctx);
 const int numInputs = inputs.size();
 LOG_ERROR("no of inputs are "<<numInputs);
 LOG_ERROR("node outsize and op type are "<<node.output().size()<< " type " << node.op_type());

const auto scores_dims = scores_tensor->getDimensions();
 const auto boxes_dims = boxes_tensor->getDimensions();
 LOG_ERROR("boxes dims "<< boxes_dims.nbDims << " dim3 has size "<<boxes_dims.d[2]);
    const std::string pluginName = "BatchedNMS_TRT";
 const std::string pluginVersion = "1";
 std::vector<nvinfer1::PluginField> f;

 bool share_location = true;
 const bool is_normalized = true;
 const bool clip_boxes = true;
 int backgroundLabelId = 0;
// Initialize.
 f.emplace_back("shareLocation", &share_location, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("isNormalized", &is_normalized, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("clipBoxes", &clip_boxes, nvinfer1::PluginFieldType::kINT8, 1);
 f.emplace_back("backgroundLabelId", &backgroundLabelId, nvinfer1::PluginFieldType::kINT32, 1);
 // Create plugin from registry
 nvinfer1::IPluginV2* plugin = importPluginFromRegistry(ctx, pluginName, pluginVersion, node.name(), f);

 ASSERT(plugin != nullptr && "NonMaxSuppression plugin was not found in the plugin registry!",
     ErrorCode::kUNSUPPORTED_NODE);

 std::vector<nvinfer1::ITensor*> nms_inputs ={boxes_tensor, scores_tensor};
 RETURN_FIRST_OUTPUT(ctx->network()->addPluginV2(nms_inputs.data(), nms_inputs.size(), *plugin));
}

However, when I try to run the above code, it crashes at:

nvinfer1::plugin::BatchedNMSPlugin::getOutputDimensions() where it fails for the ASSERT(inputs[0].nbDims == 3); however, in DEFINE_BUILTIN_OP_IMPORTER(NonMaxSuppression) in my above function, it prints inputs[0].nbDims = 3. Why assertion fails in getOutputDimensions().?

I tried to trace it but the call is coming from libinfer runtime library:

#0  nvinfer1::plugin::BatchedNMSPlugin::getOutputDimensions (this=0x5555654b44c0, index=2, inputs=0x5555654b4800, nbInputDims=2)
at trt_src/TensorRT/TensorRT/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp:70
#1  0x00007fffe9e735fd in nvinfer1::PluginV2Layer::getOutputForm(int, std::vector<nvinfer1::TensorForm, std::allocator<nvinfer1::TensorForm> > const&) const ()
   from /usr/lib/x86_64-linux-gnu/libnvinfer.so.7
#2  0x00007fffe9f277ef in nvinfer1::Network::updateTensor(nvinfer1::NetworkTensor const*) const () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.7
#3  0x00007fffe9f27d0a in nvinfer1::NetworkTensor::getDimensions() const () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.7
#4  0x00005555555a3bd2 in onnx2trt::TensorOrWeights::shape (this=0x5555654b22c0) at /onnx-tensorrt/TensorOrWeights.hpp:96
#5  onnx2trt::parseGraph (ctx=ctx@entry=0x55555628e110, graph=..., deserializingINetwork=<optimized out>, currentNode=currentNode@entry=0x55555628e3c0)
at ModelImporter.cpp:187
#6  0x00005555555a6a7f in onnx2trt::ModelImporter::importModel (this=0x55555628e0d0, model=..., weight_count=<optimized out>, weight_descriptors=<optimized out>)
at ModelImporter.cpp:521

upon further debug I found that dimensions are not correct insidegetOutputDimensions(), if I create plugin with wildcard dimensions and then check the values in getOutputDimensions then they are correct. However, my input tensor does not have any wildcard dimensions.

Hi @roshanchaudhari,
Can you clear up your comment, did you solve the issue you were facing? or do you have another question?

thanks

Yeah, I did find workaround. Lets close this one.

However, I still have few questions about tensor-rt:

  1. If I am writing a new layer plugin, is it necessary to define extent of each dimension for the output if I am deriving it from IPluginV2DynamicExt ?
    For example If I am writing plugin for some op, I cannot define dimension even in terms of input dims, is that okay to use -1 constant for dimension ?
  2. How is output type decided if writing a plugin and using existing layer?

Dear @roshanchaudhari
I cannot define dimension even in terms of input dims, is that okay to use -1 constant for dimension ?

Does that mean you want to add -1 to plugin’s input buffers dims?

How is output type decided if writing a plugin and using existing layer?

You mean how to write getOutputDataType() for your plugin? what do you mean by using existing layer? Can you elaborate ?

#1. Please see below snippet.

DimsExprs MyPlugin::getOutputDimensions(
    int outputIndex, const nvinfer1::DimsExprs* inputs, int nbInputs, nvinfer1::IExprBuilder& exprBuilder)
{
  nvinfer1::DimsExprs output;
  output.nbDims = 2;
  output.d[0] = exprBuilder.constant(inputs[0].nbDims);
  
  //  Is this allowed ? If I cannot define extent of any of output dims. 
  output.d[1] = exprBuilder.constant(-1);
   return output;
}
  1. Solved.

Dear @roshanchaudhari,
Is the issue solved?

No. I am still waiting for the answer.

Dear @roshanchaudhari,
It is not allowed. The output dimensions must be computable from the input dimensions and parameters within the layer. Dimensions that depend on tensor data at runtime are not allowed.

may I know why you are looking for this use case?

For use case, if we want to write a plugin for NonZero op which basically returns indices of non zero elements from the tensor. In this case we cannot define output dimensions.

Dear @roshanchaudhari,
TRT has two face execution model. First phase is on CPU, computes values of shape tensors. Second phase is streamed on GPU and computed execution shape tensor. Information can only flow from phase 1 to 2.

One work around is that, the buffersize can be stored in buffer[0] or having two outputs one indicates buffer cound and other is to store nonzero indices. But the problem is none of
TRT dimensioning infrastructure can read buffer count from GPU. So you had to end the engine after this operation or have all layers as plugin layers which also have dimensions stored on GPU. Generally this wont be the case for a DNN. Please file a bug for this use case, our engineering team will prioritize it.

Hi everyone, I’ve been following this thread because I have the same issue. Basically need to register a NonMaximumSupression operation on onnx-tensorrt. So I wrote as adviced by @roshanchaudhari on builtin_op_importers.cpp the following:

DEFINE_BUILTIN_OP_IMPORTER(NonMaxSuppression)
{
        // NonMaxSuppression is not supported opset below 10.
        ASSERT(ctx->getOpsetVersion() >= 10, ErrorCode::kUNSUPPORTED_NODE);

        nvinfer1::ITensor* boxes_tensor = &convertToTensor(inputs.at(0), ctx);
        nvinfer1::ITensor* scores_tensor = &convertToTensor(inputs.at(1), ctx);
        const int numInputs = inputs.size();
        LOG_ERROR("no of inputs are "<<numInputs);
        LOG_ERROR("node outsize and op type are "<<node.output().size()<< " type " << node.op_type());

        const auto scores_dims = scores_tensor->getDimensions();
        const auto boxes_dims = boxes_tensor->getDimensions();
        LOG_ERROR("boxes dims "<< boxes_dims.nbDims << " dim3 has size "<<boxes_dims.d[2]);
        const std::string pluginName = "BatchedNMS_TRT";
        const std::string pluginVersion = "1";
        std::vector<nvinfer1::PluginField> f;

        bool share_location = true;
        const bool is_normalized = true;
        const bool clip_boxes = true;
        int backgroundLabelId = 0;
        // Initialize.
        f.emplace_back("shareLocation", &share_location, nvinfer1::PluginFieldType::kINT8, 1);
        f.emplace_back("isNormalized", &is_normalized, nvinfer1::PluginFieldType::kINT8, 1);
        f.emplace_back("clipBoxes", &clip_boxes, nvinfer1::PluginFieldType::kINT8, 1);
        f.emplace_back("backgroundLabelId", &backgroundLabelId, nvinfer1::PluginFieldType::kINT32, 1);
        // Create plugin from registry
        // nvinfer1::IPluginV2* plugin = importPluginFromRegistry(ctx, pluginName, pluginVersion, node.name(), f);
        nvinfer1::IPluginV2* plugin = createPlugin(node.name(), importPluginCreator(pluginName, pluginVersion), f);

        ASSERT(plugin != nullptr && "NonMaxSuppression plugin was not found in the plugin registry!",
                   ErrorCode::kUNSUPPORTED_NODE);

        std::vector<nvinfer1::ITensor*> nms_inputs ={boxes_tensor, scores_tensor};
        RETURN_FIRST_OUTPUT(ctx->network()->addPluginV2(nms_inputs.data(), nms_inputs.size(), *plugin));
}

However I got the following error:

[2020-10-27 16:19:27   ERROR] /opt/onnx-tensorrt/builtin_op_importers.cpp:122: no of inputs are 5
[2020-10-27 16:19:27   ERROR] /opt/onnx-tensorrt/builtin_op_importers.cpp:123: node outsize and op type are 1 type NonMaxSuppression
[2020-10-27 16:19:27   ERROR] /opt/onnx-tensorrt/builtin_op_importers.cpp:127: boxes dims 3 dim3 has size 4
#assertion/home/jenkins/workspace/OSS/L0_MergeRequest/oss/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp,77
Aborted (core dumped)

Anyone has a solid advice to solve this issue?

Did you ever solve this issue? I am trying to deploy with TensorRT, and it is looking for a Non Max Suppresion plugin for TensorFlow 2 SSD model retrained with their API.

I found a working solution.
The documentation fails to mention that BatchedNMSPlugin is modeled directly after TensorFlow CombinedNonMaxSuppression:

as compared to

So I modified my TF model to use CombinedNMS, then wrote a script using ONNX Graphsurgeon that convert nodes from CombinedNonMaxSuppression to BatchedNMSDynamic_TRT based on the the following mapping from the tf2tensorrt code:

@klinten

Awesome to see someone has found a working solution. There are a lot of posts on github and these forums of people looking to figure out how to do that.

If I understand correctly then the batchedNonMaxSuppression layer of almost every TensorFlow2 model has a solution with what you have shared?

Would you share what you did in GraphSurgeon with details as to your steps? I myself am still learning how to deploy my first model and I have not been able to figure this out while working on it hours upon hours every day.

I can send you a link to my models if you don’t mind taking a look.

Well, if you can do your NMS with CombinedNMS, then you can use TensorRT with BatchedNMS, and I think a lot of people should be ablet to do that. Though I really think Nvidia should just go ahead and implement the NMS specified in ONNX once and for all.

Unfortunately I’m not allowed to share code that I write for work, so you’ll have to reinvent the wheel yourself here, but one thing that I forgot to mention is that you need to make sure that tf2onnx doesn’t convert your CombinedNMS. For that, pass the flag --custom-ops CombinedNonMaxSuppression. Then, load the network in onnx_graphsurgeon and iterate over all CombinedNonMaxSuppression nodes. For every node you replace it with a BatchedNMS_TRT node according to the recipe I linked to from the TensorFlow code.

1 Like