Confusing NMS layer implementation in TensorRT 8.5

magnusm · August 1, 2023, 9:24am

Description

We use NonMaxSuppression nodes in our ONNX models, and are then parsing them to build the TensorRT engine.

Our implementation using v8.0.1 works great, the “EXPERIMENTAL” support for NMS in tensorrt-onnx uses the EfficientNMS_ONNX_TRT plugin.

We are now upgrading to v8.5.3, where the INMSLayer has been introduced by TensorRT, and is now also used by onnx-tensorrt instead of the plugin. We are having a various issues with this implementation, and there are no examples of its usage.

Issues

Dynamic dimensions must now be supported.

Previously the NMS implementation would output a fixed shape of indices, and the indices would be padded so you could detect the unique detections. Now the indices output shape is (-1, 3), requiring the more complex handling of dynamic outputs:

[07/31/2023-14:27:42] [V] [TRT] Parsing node: node_of_detection_task_nms_indices [NonMaxSuppression]
[07/31/2023-14:27:42] [V] [TRT] Searching for input: detection_task_boxes
[07/31/2023-14:27:42] [V] [TRT] Searching for input: detection_task_scores
[07/31/2023-14:27:42] [V] [TRT] Searching for input: max_output_boxes_per_class
[07/31/2023-14:27:42] [V] [TRT] Searching for input: iou_threshold
[07/31/2023-14:27:42] [V] [TRT] Searching for input: score_threshold
[07/31/2023-14:27:42] [V] [TRT] node_of_detection_task_nms_indices [NonMaxSuppression] inputs: [detection_task_boxes -> (1, 24552, 4)[FLOAT]], [detection_task_scores -> (1, 1, 24552)[FLOAT]], [max_output_boxes_per_class -> (1)[INT32]], [iou_threshold -> (1)[FLOAT]], [score_threshold -> (1)[FLOAT]], 
[07/31/2023-14:27:42] [V] [TRT] Registering layer: node_of_detection_task_nms_indices for ONNX node: node_of_detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] Registering layer: node_of_detection_task_nms_indices_34 for ONNX node: node_of_detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] Registering tensor: detection_task_nms_indices_35 for ONNX tensor: detection_task_nms_indices
[07/31/2023-14:27:42] [V] [TRT] node_of_detection_task_nms_indices [NonMaxSuppression] outputs: [detection_task_nms_indices -> (-1, 3)[INT32]], 
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_boxes_32 as output: detection_task_boxes
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_scores_33 as output: detection_task_scores
[07/31/2023-14:27:42] [V] [TRT] Marking detection_task_nms_indices_35 as output: detection_task_nms_indices

This seems like unnecessary complexity, as I still have to allocate the memory for the maximum number of boxes, using the formula from the docs batchSize * numClasses * min(numInputBoundingBoxes, MaxOutputBoxesPerClass).

Unable access the second output `NumOutputBoxes`.

According the documentation, there should be another output:

However it appears onnx-tensorrt only uses the first output, see here:

    auto* layer = ctx->network()->addNMS(*boxesTensorPtr, *transposedScoresTensorPtr, *maxOutputBoxesPerClassTensorPtr);
    
    ...

    RETURN_FIRST_OUTPUT(layer);

NMS outputs no longer conform to ONNX specification

The old EfficientNMS_ONNX_TRT plugin had a param that was set outputONNXIndices, which ensure a single output, that conforms the ONNX specification. This is no longer the case with the name INMSLayer, which has two outputs.

Questions

Can I work around the above issues?
Are there examples of NonMaxSuppression being using with ONNX and TensorRT v8.5+?
Is there a way load the old EfficientNMS_ONNX_TRT, in place of the INMSLayer?

Environment

TensorRT Version: 8.5.3
GPU Type: RTX 3070 Laptop GPU
Nvidia Driver Version: 525.125.06
CUDA Version: 11.4
Operating System + Version: Ubuntu 20.04

spolisetty · September 29, 2023, 12:44pm

The fact that the layer has two outputs does not imply that the ONNX parser must use both to conform - only the first output is required to conform to the ONNX semantics. ONNX does not know the size of the output array until the layer runs, which corresponds to TRT’s design that the output size is unknown and thus reported as -1.

In fact, the ‘NumOutputBoxes’ parameter is unnecessary because you can simply use a shape operator on the first operand to determine how many boxes there are.

Please refer to the the importer definition for NonMaxSuppression in our ONNX parser for an example of INMSLayer usage.

Thank you.

lix19937 · May 8, 2024, 5:17am

mark, new view

Topic		Replies	Views
Onnx to tensorrt plugin for NonMaxSuppression TensorRT tensorrt , tensorflow	1	2500	April 26, 2020
ONNX Plugin Layer implements TensorRT	11	1913	January 12, 2021
Writing layer for NonMaxSuppression in onnx parser DRIVE AGX Xavier General driveos-dl	21	3752	October 12, 2021
Variable output shape of NonMaxSuppression TensorRT	3	1124	April 19, 2022
PyTorch usage of INMSLayer on TensorRT TensorRT pytorch	3	650	January 4, 2025
TensorRT6 OnnxParser could not support dynamic shape. TensorRT	11	3252	November 8, 2019
Efficient NMS plugin to TensorRT engine at runtime TensorRT	4	6713	May 17, 2022
Error while trying to convert onnx to tensorrt engine TensorRT tensorrt , cudnn , onnx	1	55	March 28, 2025
TensorRT's OnnxParser problem TensorRT tensorrt	6	2328	October 12, 2021
Troubleshooting Suggestions for ONNX v. TensorRT discrepancies TensorRT	7	1848	October 12, 2021