Adding BatchedNMSDynamic_TRT plugin in the ssd mobileNet onnx model

godsondeep18 · August 6, 2021, 3:35pm

Description

I have used a tflite file of ssdlite mobileNet v2 object detection model.

Used tf2onnx for converting the tflite file to ONNX as shown below :
python -m tf2onnx.convert --opset 11 --tflite [tflite_model.onnx|attachment](upload://5tc3mBujNZH78fOCm8jh7Q1EtkT.onnx) (17.2 MB) [nms_plugin.onnx|attachment](upload://5kPdlB4I65UNO4y35nYKsbJhxl5.onnx) (17.1 MB) [model.tflite|attachment](upload://wPNHhfylbURDZDSJkqYxZTD9zHA.tflite) (17.1 MB) .tflite --output model_tflite.onnx
Replace the NMS layer with BatchedNMSDynamic_TRT plugin

import onnx_graphsurgeon as gs
import onnx
import numpy as np

input_model_path = "tflite_model.onnx"
output_model_path = "nms_plugin.onnx"


@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output,
                    share_location, num_classes):
    
    boxes_input.outputs.clear()
    scores_input.outputs.clear()

    attrs = {
        "shareLocation": share_location,
        "numClasses": num_classes,
        "backgroundLabelId": 0,
        "topK": 100,
        "keepTopK": 100,
        "scoreThreshold": 0.3,
        "iouThreshold": 0.6,
        "isNormalized": True,
        "clipBoxes": True,
        "scoreBits": 16

    }

    return self.layer(op="BatchedNMSDynamic_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=nms_output)


# load the graph 
graph = gs.import_onnx(onnx.load(input_model_path))
graph.inputs[0].shape = [1, 300, 300, 3]
print(graph.inputs[0].shape)

tmap = graph.tensors()

outArray = ["TFLite_Detection_PostProcess", "TFLite_Detection_PostProcess:1", "TFLite_Detection_PostProcess:2",
            "TFLite_Detection_PostProcess:3"]

for i in range(len(outArray)):
    nms_out_test = tmap[outArray[i]]
    nms_out_test.inputs.clear()

nms_out = []
for i in range(len(outArray)):
    nms_out.append(tmap[outArray[i]])


# Can also get attributes from the original graph instead of hard-coding
graph.trt_batched_nms(tmap["concat"], tmap["convert_scores"],
                      nms_out, share_location=False,
                      num_classes=90)


# set the graph
graph.outputs[0].dtype = np.int32

# clean the graph 
graph.cleanup().toposort()

# save the onnx model 
onnx.save_model(gs.export_onnx(graph), output_model_path)
print("Saving the ONNX model to {}".format(output_model_path))

Convert the onnx to .TRT :
trtexec --onnx=nms_plugin.onnx --saveEngine=TRT_Engine.trt --explicitBatch --verbose

Output : Error

[08/05/2021-17:31:20] [V] [TRT] Tactic: 1002 time 0.121252
[08/05/2021-17:31:20] [V] [TRT] Tactic: 0 time 0.011772
[08/05/2021-17:31:20] [V] [TRT] Fastest Tactic: 0 Time: 0.011772
[08/05/2021-17:31:20] [V] [TRT] *************** Autotuning format combination: Float(1,4,4,7668), Float(1,91,174447) -> Int32(1,1), Float(1,4,400), Float(1,100), Float(1,100) ***************
[08/05/2021-17:31:20] [V] [TRT] Formats and tactics selection completed in 102.21 seconds.
[08/05/2021-17:31:20] [V] [TRT] After reformat layers: 109 layers
[08/05/2021-17:31:20] [V] [TRT] Block size 16777216
[08/05/2021-17:31:20] [V] [TRT] Block size 8640000
[08/05/2021-17:31:20] [V] [TRT] Block size 3240448
[08/05/2021-17:31:20] [V] [TRT] Block size 1440256
[08/05/2021-17:31:20] [V] [TRT] Block size 394240
[08/05/2021-17:31:20] [V] [TRT] Block size 218624
[08/05/2021-17:31:20] [V] [TRT] Block size 51200
[08/05/2021-17:31:20] [V] [TRT] Block size 19968
[08/05/2021-17:31:20] [V] [TRT] Block size 17408
[08/05/2021-17:31:20] [V] [TRT] Block size 9728
[08/05/2021-17:31:20] [V] [TRT] Block size 9216
[08/05/2021-17:31:20] [V] [TRT] Block size 2560
[08/05/2021-17:31:20] [V] [TRT] Block size 2560
[08/05/2021-17:31:20] [V] [TRT] Block size 1024
[08/05/2021-17:31:20] [V] [TRT] Block size 512
[08/05/2021-17:31:20] [V] [TRT] Total Activation Memory: 30824960
[08/05/2021-17:31:20] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[08/05/2021-17:31:20] [F] [TRT] Assertion failed: in[0].desc.dims.d[2] == numLocClasses
/home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp:321
Aborting...

#####################################
Observations:

The output of the tflite graph was this which was actually the anchor boxes of 1917, not the raw bounding boxes accepted by the BatchedNMSDynamic_TRT plugin[TensorRT/plugin/batchedNMSPlugin at master · NVIDIA/TensorRT · GitHub]

So how and which plugin node needs to be added to achieve this functionality

Environment

TensorRT Version: 7.2.4
CUDA Version: 10
Operating System + Version: ubuntu20

Relevant Files

Issue Details: How to add NMS with Tensorflow Model (that was converted to ONNX) · Issue #1379 · NVIDIA/TensorRT · GitHub

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

spolisetty · August 9, 2021, 5:06pm

Hi @godsondeep18,

Our team is looking into this issue. We recommend you to please follow up on the same git issue.

Thank you.

NVES · August 12, 2021, 5:29am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

godsondeep18 · August 26, 2021, 10:42am

godsondeep18:

import onnx_graphsurgeon as gs
import onnx
import numpy as np

input_model_path = "tflite_model.onnx"
output_model_path = "nms_plugin.onnx"


@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output,
                    share_location, num_classes):
    
    boxes_input.outputs.clear()
    scores_input.outputs.clear()

    attrs = {
        "shareLocation": share_location,
        "numClasses": num_classes,
        "backgroundLabelId": 0,
        "topK": 100,
        "keepTopK": 100,
        "scoreThreshold": 0.3,
        "iouThreshold": 0.6,
        "isNormalized": True,
        "clipBoxes": True,
        "scoreBits": 16

    }

    return self.layer(op="BatchedNMSDynamic_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=nms_output)


# load the graph 
graph = gs.import_onnx(onnx.load(input_model_path))
graph.inputs[0].shape = [1, 300, 300, 3]
print(graph.inputs[0].shape)

tmap = graph.tensors()

outArray = ["TFLite_Detection_PostProcess", "TFLite_Detection_PostProcess:1", "TFLite_Detection_PostProcess:2",
            "TFLite_Detection_PostProcess:3"]

for i in range(len(outArray)):
    nms_out_test = tmap[outArray[i]]
    nms_out_test.inputs.clear()

nms_out = []
for i in range(len(outArray)):
    nms_out.append(tmap[outArray[i]])


# Can also get attributes from the original graph instead of hard-coding
graph.trt_batched_nms(tmap["concat"], tmap["convert_scores"],
                      nms_out, share_location=False,
                      num_classes=90)


# set the graph
graph.outputs[0].dtype = np.int32

# clean the graph 
graph.cleanup().toposort()

# save the onnx model 
onnx.save_model(gs.export_onnx(graph), output_model_path)
print("Saving the ONNX model to {}".format(output_model_path))

Hi @NVES

Based on your inputs i am sharing all the 3 files (mobilenetv2.tflite, tflite2onnx_model.onnx, onnxWithNms.onnx), and also the code is shared above

I tried validating the model as suggested got the below error

I have also shared the trtexec logs above sharing it again

[08/05/2021-17:31:20] [V] [TRT] Tactic: 1002 time 0.121252
[08/05/2021-17:31:20] [V] [TRT] Tactic: 0 time 0.011772
[08/05/2021-17:31:20] [V] [TRT] Fastest Tactic: 0 Time: 0.011772
[08/05/2021-17:31:20] [V] [TRT] *************** Autotuning format combination: Float(1,4,4,7668), Float(1,91,174447) -> Int32(1,1), Float(1,4,400), Float(1,100), Float(1,100) ***************
[08/05/2021-17:31:20] [V] [TRT] Formats and tactics selection completed in 102.21 seconds.
[08/05/2021-17:31:20] [V] [TRT] After reformat layers: 109 layers
[08/05/2021-17:31:20] [V] [TRT] Block size 16777216
[08/05/2021-17:31:20] [V] [TRT] Block size 8640000
[08/05/2021-17:31:20] [V] [TRT] Block size 3240448
[08/05/2021-17:31:20] [V] [TRT] Block size 1440256
[08/05/2021-17:31:20] [V] [TRT] Block size 394240
[08/05/2021-17:31:20] [V] [TRT] Block size 218624
[08/05/2021-17:31:20] [V] [TRT] Block size 51200
[08/05/2021-17:31:20] [V] [TRT] Block size 19968
[08/05/2021-17:31:20] [V] [TRT] Block size 17408
[08/05/2021-17:31:20] [V] [TRT] Block size 9728
[08/05/2021-17:31:20] [V] [TRT] Block size 9216
[08/05/2021-17:31:20] [V] [TRT] Block size 2560
[08/05/2021-17:31:20] [V] [TRT] Block size 2560
[08/05/2021-17:31:20] [V] [TRT] Block size 1024
[08/05/2021-17:31:20] [V] [TRT] Block size 512
[08/05/2021-17:31:20] [V] [TRT] Total Activation Memory: 30824960
[08/05/2021-17:31:20] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[08/05/2021-17:31:20] [F] [TRT] Assertion failed: in[0].desc.dims.d[2] == numLocClasses
/home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp:321
Aborting...

the files gdrive link : nvidia_onnx_issue - Google Drive

spolisetty · September 7, 2021, 12:02pm

Hi @godsondeep18,

Sorry for the delayed response. Are you still facing this issue. Looks like there is some problem with ONNX model, please make sure ONNX model generated correctly.
And try on the latest TensorRT version. Please let us know if you still face this issue.

Also, looks like you are using Tensorflow Object Detection, please refer https://github.com/pskiran1/TensorRT-support-for-Tensorflow-2-Object-Detection-Models . This repo helps in converting most of the models trained via TF OD.