Confusing NMS layer implementation in TensorRT 8.5

Description

We use NonMaxSuppression nodes in our ONNX models, and are then parsing them to build the TensorRT engine.

Our implementation using v8.0.1 works great, the “EXPERIMENTAL” support for NMS in tensorrt-onnx uses the EfficientNMS_ONNX_TRT plugin.

We are now upgrading to v8.5.3, where the INMSLayer has been introduced by TensorRT, and is now also used by onnx-tensorrt instead of the plugin. We are having a various issues with this implementation, and there are no examples of its usage.

Issues

Dynamic dimensions must now be supported.

Previously the NMS implementation would output a fixed shape of indices, and the indices would be padded so you could detect the unique detections. Now the indices output shape is (-1, 3), requiring the more complex handling of dynamic outputs:

[07/31/2023-14:27:42] [V] [TRT] Parsing node: node_of_detection_task_nms_indices [NonMaxSuppression]
[07/31/2023-14:27:42] [V] [TRT] Searching for input: detection_task_boxes
[07/31/2023-14:27:42] [V] [TRT] Searching for input: detection_task_scores
[07/31/2023-14:27:42] [V] [TRT] Searching for input: max_output_boxes_per_class
[07/31/2023-14:27:42] [V] [TRT] Searching for input: iou_threshold
[07/31/2023-14:27:42] [V] [TRT] Searching for input: score_threshold
[07/31/2023-14:27:42] [V] [TRT] node_of_detection_task_nms_indices [NonMaxSuppression] inputs: [detection_task_boxes -> (1, 24552, 4)[FLOAT]], [detection_task_scores -> (1, 1, 24552)[FLOAT]], [max_output_boxes_per_class -> (1)[INT32]], [iou_threshold -> (1)[FLOAT]], [score_threshold -> (1)[FLOAT]], 
[07/31/2023-14:27:42] [V] [TRT] Registering layer: node_of_detection_task_nms_indices for ONNX node: node_of_detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] Registering layer: node_of_detection_task_nms_indices_34 for ONNX node: node_of_detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] Registering tensor: detection_task_nms_indices_35 for ONNX tensor: detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] node_of_detection_task_nms_indices [NonMaxSuppression] outputs: [detection_task_nms_indices -> (-1, 3)[INT32]], 
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_boxes_32 as output: detection_task_boxes
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_scores_33 as output: detection_task_scores
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_nms_indices_35 as output: detection_task_nms_indices

This seems like unnecessary complexity, as I still have to allocate the memory for the maximum number of boxes, using the formula from the docs batchSize * numClasses * min(numInputBoundingBoxes, MaxOutputBoxesPerClass).

Unable access the second output NumOutputBoxes.

According the documentation, there should be another output:


However it appears onnx-tensorrt only uses the first output, see here:

    auto* layer = ctx->network()->addNMS(*boxesTensorPtr, *transposedScoresTensorPtr, *maxOutputBoxesPerClassTensorPtr);
    
    ...

    RETURN_FIRST_OUTPUT(layer);

NMS outputs no longer conform to ONNX specification

The old EfficientNMS_ONNX_TRT plugin had a param that was set outputONNXIndices, which ensure a single output, that conforms the ONNX specification. This is no longer the case with the name INMSLayer, which has two outputs.

Questions

  • Can I work around the above issues?
  • Are there examples of NonMaxSuppression being using with ONNX and TensorRT v8.5+?
  • Is there a way load the old EfficientNMS_ONNX_TRT, in place of the INMSLayer?

Environment

TensorRT Version: 8.5.3
GPU Type: RTX 3070 Laptop GPU
Nvidia Driver Version: 525.125.06
CUDA Version: 11.4
Operating System + Version: Ubuntu 20.04

The fact that the layer has two outputs does not imply that the ONNX parser must use both to conform - only the first output is required to conform to the ONNX semantics. ONNX does not know the size of the output array until the layer runs, which corresponds to TRT’s design that the output size is unknown and thus reported as -1.

In fact, the ‘NumOutputBoxes’ parameter is unnecessary because you can simply use a shape operator on the first operand to determine how many boxes there are.

Please refer to the the importer definition for NonMaxSuppression in our ONNX parser for an example of INMSLayer usage.

Thank you.

mark, new view