Although I submitted the same issue via bug reporting system(#2438344), I’m posting here as well in case my interpretation of NMS_TRT
plugin API is incorrect.
[Platform details]
Linux distro and version: Ubuntu 16.04.5 LTS
GPU type: GeForce GTX 1080 Ti
nvidia driver version: 384.111
CUDA version: 9.0.176
CUDNN version: 7.3.0.29
Python version [if using python]: 3.5.2
Tensorflow version: 1.11.0
TensorRT version: debian packages with 5.0.2-1+cuda9.0
in nv-tensorrt-repo-ubuntu1604-cuda9.0-trt5.0.2.6-ga-20181009_1-1_amd64
[Python codes to reproduce]
#!/usr/bin/env python3
import graphsurgeon as gs
import numpy as np
import tensorflow as tf
import tensorrt as trt
import uff
try:
import common
except ImportError:
print('Need to import /usr/src/tensorrt/samples/python/common.py')
print('e.g. export PYTHONPATH=$PYTHONPATH:/usr/src/tensorrt/samples/python/')
raise
# https://devtalk.nvidia.com/default/topic/1038494/tensorrt/logicerror-explicit_context_dependent-failed-invalid-device-context-no-currently-active-context-/post/5284290/#5284290
import pycuda.autoinit
num_anchors = 1
num_classes = 2
loc_data = tf.placeholder(tf.float32, [num_anchors * 4, 1, 1], name='loc_data')
conf_data = tf.placeholder(tf.float32, [num_anchors * num_classes, 1, 1], name='conf_data')
priorbox_data = tf.placeholder(tf.float32, [2, num_anchors * 4, 1], name='priorbox_data')
NMS = gs.create_plugin_node(
name="NMS",
op="NMS_TRT",
shareLocation=1,
varianceEncodedInTarget=0,
backgroundLabelId=0,
confidenceThreshold=1e-8,
nmsThreshold=0.6,
topK=100,
keepTopK=100,
numClasses=num_classes,
inputOrder=[0, 1, 2], # loc_data, conf_data, priorbox_data
confSigmoid=0,
isNormalized=0,
codeType=0, # CORNER = 0
)
NMS.input.extend([tensor.op.name for tensor in [loc_data, conf_data, priorbox_data]])
dynamic_graph = gs.DynamicGraph(tf.get_default_graph().as_graph_def())
dynamic_graph.append(NMS)
TRT_LOGGER = trt.Logger(trt.Logger.Severity.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
serialized_uff = uff.from_tensorflow(
dynamic_graph.as_graph_def(),
output_nodes=['NMS'],
output_filename='/tmp/trt_NMS_test.uff',
text=False,
)
with trt.Builder(TRT_LOGGER) as builder:
builder.max_workspace_size = common.GiB(1)
with builder.create_network() as network:
uff_parser = trt.UffParser()
uff_parser.register_input('loc_data', [num_anchors * 4, 1, 1])
uff_parser.register_input('conf_data', [num_anchors * num_classes, 1, 1])
uff_parser.register_input('priorbox_data', [2, num_anchors * 4, 1])
uff_parser.register_output('NMS')
uff_parser.parse('/tmp/trt_NMS_test.uff', network)
with builder.build_cuda_engine(network) as engine:
inputs, outputs, bindings, stream = common.allocate_buffers(engine)
with engine.create_execution_context() as context:
loc_data_np = np.array([0, 0, 0, 0], dtype=np.float32)
conf_data_np = np.array([0, 1], dtype=np.float32)
priorbox_data_np = np.array([10, 10, 50, 50, 0, 0, 0, 0], dtype=np.float32)
np.copyto(inputs[0].host, loc_data_np.ravel())
np.copyto(inputs[1].host, conf_data_np.ravel())
np.copyto(inputs[2].host, priorbox_data_np.ravel())
results = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
detection_output = results[0].reshape((-1, 7))
num_detections = int(results[1])
print('Number of detections: {}'.format(num_detections))
print(detection_output[:num_detections, :])
[Log output]
UFF Version 0.5.5
=== Automatically deduced input nodes ===
[name: "loc_data"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
dim {
size: 4
}
dim {
size: 1
}
dim {
size: 1
}
}
}
}
, name: "conf_data"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
dim {
size: 2
}
dim {
size: 1
}
dim {
size: 1
}
}
}
}
, name: "priorbox_data"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
dim {
size: 2
}
dim {
size: 4
}
dim {
size: 1
}
}
}
}
]
=========================================
Using output node NMS
Converting to UFF graph
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
No. nodes: 5
UFF Output written to /tmp/trt_NMS_test.uff
[TensorRT] INFO: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - Region_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] INFO: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] INFO: UFFParser: parsing loc_data
[TensorRT] INFO: UFFParser: parsing conf_data
[TensorRT] INFO: UFFParser: parsing priorbox_data
[TensorRT] INFO: UFFParser: parsing NMS
[TensorRT] INFO: UFFParser: parsing MarkOutput_0
[TensorRT] INFO: Original: 1 layers
[TensorRT] INFO: After dead-layer removal: 1 layers
[TensorRT] INFO: After scale fusion: 1 layers
[TensorRT] INFO: After vertical fusions: 1 layers
[TensorRT] INFO: After swap: 1 layers
[TensorRT] INFO: After final dead-layer removal: 1 layers
[TensorRT] INFO: After tensor merging: 1 layers
[TensorRT] INFO: After concat removal: 1 layers
[TensorRT] INFO: Graph construction and optimization completed in 0.000115415 seconds.
[TensorRT] INFO: Formats and tactics selection completed in 1.04179 seconds.
[TensorRT] INFO: After reformat layers: 1 layers
[TensorRT] INFO: Block size 1073741824
[TensorRT] INFO: Total Activation Memory: 1073741824
[TensorRT] INFO: Data initialization and engine generation completed in 0.00922991 seconds.
Number of detections: 0
[]