Description
We use NonMaxSuppression
nodes in our ONNX models, and are then parsing them to build the TensorRT engine.
Our implementation using v8.0.1 works great, the “EXPERIMENTAL” support for NMS in tensorrt-onnx uses the EfficientNMS_ONNX_TRT
plugin.
We are now upgrading to v8.5.3, where the INMSLayer has been introduced by TensorRT, and is now also used by onnx-tensorrt instead of the plugin. We are having a various issues with this implementation, and there are no examples of its usage.
Issues
Dynamic dimensions must now be supported.
Previously the NMS implementation would output a fixed shape of indices, and the indices would be padded so you could detect the unique detections. Now the indices output shape is (-1, 3), requiring the more complex handling of dynamic outputs:
[07/31/2023-14:27:42] [V] [TRT] Parsing node: node_of_detection_task_nms_indices [NonMaxSuppression]
[07/31/2023-14:27:42] [V] [TRT] Searching for input: detection_task_boxes
[07/31/2023-14:27:42] [V] [TRT] Searching for input: detection_task_scores
[07/31/2023-14:27:42] [V] [TRT] Searching for input: max_output_boxes_per_class
[07/31/2023-14:27:42] [V] [TRT] Searching for input: iou_threshold
[07/31/2023-14:27:42] [V] [TRT] Searching for input: score_threshold
[07/31/2023-14:27:42] [V] [TRT] node_of_detection_task_nms_indices [NonMaxSuppression] inputs: [detection_task_boxes -> (1, 24552, 4)[FLOAT]], [detection_task_scores -> (1, 1, 24552)[FLOAT]], [max_output_boxes_per_class -> (1)[INT32]], [iou_threshold -> (1)[FLOAT]], [score_threshold -> (1)[FLOAT]],
[07/31/2023-14:27:42] [V] [TRT] Registering layer: node_of_detection_task_nms_indices for ONNX node: node_of_detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] Registering layer: node_of_detection_task_nms_indices_34 for ONNX node: node_of_detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] Registering tensor: detection_task_nms_indices_35 for ONNX tensor: detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] node_of_detection_task_nms_indices [NonMaxSuppression] outputs: [detection_task_nms_indices -> (-1, 3)[INT32]],
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_boxes_32 as output: detection_task_boxes
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_scores_33 as output: detection_task_scores
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_nms_indices_35 as output: detection_task_nms_indices
This seems like unnecessary complexity, as I still have to allocate the memory for the maximum number of boxes, using the formula from the docs batchSize * numClasses * min(numInputBoundingBoxes, MaxOutputBoxesPerClass)
.
Unable access the second output NumOutputBoxes
.
According the documentation, there should be another output:
However it appears onnx-tensorrt only uses the first output, see here:
auto* layer = ctx->network()->addNMS(*boxesTensorPtr, *transposedScoresTensorPtr, *maxOutputBoxesPerClassTensorPtr);
...
RETURN_FIRST_OUTPUT(layer);
NMS outputs no longer conform to ONNX specification
The old EfficientNMS_ONNX_TRT
plugin had a param that was set outputONNXIndices
, which ensure a single output, that conforms the ONNX specification. This is no longer the case with the name INMSLayer
, which has two outputs.
Questions
- Can I work around the above issues?
- Are there examples of NonMaxSuppression being using with ONNX and TensorRT v8.5+?
- Is there a way load the old
EfficientNMS_ONNX_TRT
, in place of theINMSLayer
?
Environment
TensorRT Version: 8.5.3
GPU Type: RTX 3070 Laptop GPU
Nvidia Driver Version: 525.125.06
CUDA Version: 11.4
Operating System + Version: Ubuntu 20.04