`NMS_TRT` plugin in 5.0.2.6-GA doesn't return a detection box

Although I submitted the same issue via bug reporting system(#2438344), I’m posting here as well in case my interpretation of NMS_TRT plugin API is incorrect.

[Platform details]
Linux distro and version: Ubuntu 16.04.5 LTS
GPU type: GeForce GTX 1080 Ti
nvidia driver version: 384.111
CUDA version: 9.0.176
CUDNN version: 7.3.0.29
Python version [if using python]: 3.5.2
Tensorflow version: 1.11.0
TensorRT version: debian packages with 5.0.2-1+cuda9.0 in nv-tensorrt-repo-ubuntu1604-cuda9.0-trt5.0.2.6-ga-20181009_1-1_amd64

[Python codes to reproduce]

#!/usr/bin/env python3
import graphsurgeon as gs
import numpy as np
import tensorflow as tf
import tensorrt as trt
import uff

try:
    import common
except ImportError:
    print('Need to import /usr/src/tensorrt/samples/python/common.py')
    print('e.g. export PYTHONPATH=$PYTHONPATH:/usr/src/tensorrt/samples/python/')
    raise

# https://devtalk.nvidia.com/default/topic/1038494/tensorrt/logicerror-explicit_context_dependent-failed-invalid-device-context-no-currently-active-context-/post/5284290/#5284290
import pycuda.autoinit

num_anchors = 1
num_classes = 2

loc_data = tf.placeholder(tf.float32, [num_anchors * 4, 1, 1], name='loc_data')
conf_data = tf.placeholder(tf.float32, [num_anchors * num_classes, 1, 1], name='conf_data')
priorbox_data = tf.placeholder(tf.float32, [2, num_anchors * 4, 1], name='priorbox_data')

NMS = gs.create_plugin_node(
    name="NMS",
    op="NMS_TRT",
    shareLocation=1,
    varianceEncodedInTarget=0,
    backgroundLabelId=0,
    confidenceThreshold=1e-8,
    nmsThreshold=0.6,
    topK=100,
    keepTopK=100,
    numClasses=num_classes,
    inputOrder=[0, 1, 2],  # loc_data, conf_data, priorbox_data
    confSigmoid=0,
    isNormalized=0,
    codeType=0,  # CORNER = 0
    )

NMS.input.extend([tensor.op.name for tensor in [loc_data, conf_data, priorbox_data]])

dynamic_graph = gs.DynamicGraph(tf.get_default_graph().as_graph_def())
dynamic_graph.append(NMS)

TRT_LOGGER = trt.Logger(trt.Logger.Severity.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')

serialized_uff = uff.from_tensorflow(
    dynamic_graph.as_graph_def(),
    output_nodes=['NMS'],
    output_filename='/tmp/trt_NMS_test.uff',
    text=False,
    )

with trt.Builder(TRT_LOGGER) as builder:
    builder.max_workspace_size = common.GiB(1)
    with builder.create_network() as network:
        uff_parser = trt.UffParser()
        uff_parser.register_input('loc_data', [num_anchors * 4, 1, 1])
        uff_parser.register_input('conf_data', [num_anchors * num_classes, 1, 1])
        uff_parser.register_input('priorbox_data', [2, num_anchors * 4, 1])
        uff_parser.register_output('NMS')
        uff_parser.parse('/tmp/trt_NMS_test.uff', network)

        with builder.build_cuda_engine(network) as engine:
            inputs, outputs, bindings, stream = common.allocate_buffers(engine)
            with engine.create_execution_context() as context:
                loc_data_np = np.array([0, 0, 0, 0], dtype=np.float32)
                conf_data_np = np.array([0, 1], dtype=np.float32)
                priorbox_data_np = np.array([10, 10, 50, 50, 0, 0, 0, 0], dtype=np.float32)

                np.copyto(inputs[0].host, loc_data_np.ravel())
                np.copyto(inputs[1].host, conf_data_np.ravel())
                np.copyto(inputs[2].host, priorbox_data_np.ravel())

                results = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

                detection_output = results[0].reshape((-1, 7))
                num_detections = int(results[1])
                print('Number of detections: {}'.format(num_detections))
                print(detection_output[:num_detections, :])

[Log output]

UFF Version 0.5.5
=== Automatically deduced input nodes ===
[name: "loc_data"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: 4
      }
      dim {
        size: 1
      }
      dim {
        size: 1
      }
    }
  }
}
, name: "conf_data"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: 2
      }
      dim {
        size: 1
      }
      dim {
        size: 1
      }
    }
  }
}
, name: "priorbox_data"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: 2
      }
      dim {
        size: 4
      }
      dim {
        size: 1
      }
    }
  }
}
]
=========================================

Using output node NMS
Converting to UFF graph
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
No. nodes: 5
UFF Output written to /tmp/trt_NMS_test.uff
[TensorRT] INFO: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - Region_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] INFO: UFFParser: parsing loc_data
[TensorRT] INFO: UFFParser: parsing conf_data
[TensorRT] INFO: UFFParser: parsing priorbox_data
[TensorRT] INFO: UFFParser: parsing NMS
[TensorRT] INFO: UFFParser: parsing MarkOutput_0
[TensorRT] INFO: Original: 1 layers
[TensorRT] INFO: After dead-layer removal: 1 layers
[TensorRT] INFO: After scale fusion: 1 layers
[TensorRT] INFO: After vertical fusions: 1 layers
[TensorRT] INFO: After swap: 1 layers
[TensorRT] INFO: After final dead-layer removal: 1 layers
[TensorRT] INFO: After tensor merging: 1 layers
[TensorRT] INFO: After concat removal: 1 layers
[TensorRT] INFO: Graph construction and optimization completed in 0.000115415 seconds.
[TensorRT] INFO: Formats and tactics selection completed in 1.04179 seconds.
[TensorRT] INFO: After reformat layers: 1 layers
[TensorRT] INFO: Block size 1073741824
[TensorRT] INFO: Total Activation Memory: 1073741824
[TensorRT] INFO: Data initialization and engine generation completed in 0.00922991 seconds.
Number of detections: 0
[]

Hello, we are triaging this issue, and will keep you updated.

Hello,

2 issues and 1 request.

Request: please update to TRT 5.0 GA
Issue 1: missing a call to trt.init_libnvinfer_plugins(TRT_LOGGER, ‘’) which is required to register all TRT plugins.

Issue 2: The problem is with the allocate_buffers call. The NMS plugin returns 2 outputs, one is of type FP32 and the 2nd one (keep_count) is of type INT32. However, in the current TRT plugin infrastructure, we don’t have the ability to specify a different type as output so all output types get set as FP32 in the engine.

You can avoid this by writing your own allocate_buffers call which allocates the output buffers based on the expected type.

Engineering has attached the updated script with this function added. It works now.

test_nms_fixed.zip (2.1 KB)

Hi,

That makes sense and I verified that it works.
Thanks a lot for your support.

May I ask what’s the definition and format of “loc_data, conf_data, priorbox_data” in this plugin? I can not find detailed explanation in the developer guideline or the code file.