Variable output shape of NonMaxSuppression


When the NonMaxSuppression operator is used to produce the final output, the result has variable dimensions due to the NMS logic. For example, let’s say there’s only 1 class and if boxes is of shape 8 x 1000 x 16, max_output_boxes_per_class = 2, the selected_indices output can have shape N x 3 , where N can be anything from 0 to 16. It isn’t clear how we are supposed to bind the output buffer and read only the valid outputs. It seems the TensorRT C++ API would pad the result tensor to the max shape of 16 x 3 so that even if I use a Shape layer to print out the shape of selected_indices, I would just obtain 16 x 3 so it doesn’t help. What’s the recommended way to use this operator?


TensorRT Version:
GPU Type: GTX 1080
Nvidia Driver Version: 470.103.01
CUDA Version: 11.1
CUDNN Version: 8.2
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Please check the below link, as they might answer your concerns


Hi, I realized my use case is more inline with the ONNX-TensorRT support, since I’m using an ONNX graph that has the NonMaxSuppression operator per the ONNX spec. According to the onnx-tensorrt github repo, support for NMS is experimental and the result is always padded to a fixed shape. So how do I filter out just the valid operators?

Additionally, which plugin is used to implement this operation? I’m referring to either one of 3 plugins:
BatchedNMSPlugin, EfficientNMSPlugin, and NMSPlugin.


Variable output shapes require data-dependent shape support which is not currently available in TensorRT. This may be added in future releases.

When running ONNX graphs in TensorRT, the EfficientNMS plugin can help to provide “partial” support for the NonMaxSuppression ONNX op. The converter for this ONNX op is implemented here:

So it should just work when converting a graph that uses the native NonMaxSuppression op. However, it will pad the detections it keeps with padding values basically repeating the last detection it keeps over and over until filling the predefined max output shape of the op, in the original question, this would be a max of 16x3 shape as they noted.

However, these padded values will need to be removed as a post-processing step in CPU code, as it would be very difficult to do so within TensorRT due to the lack of data-dependent shapes.
For example, this could be done by scanning the final output of the TensorRT graph and removing duplicates at the end of the detection list.

Another possibility would be to reduce the score_threshold of the NonMaxSuppression op, to force it to keep more lower-confidence detections, which would naturally fill the op’s output tensor to its max shape with non-repeating real detections, and padding doesn’t need to happen. Then on CPU code, just filter out any lower confidence detections by their score.

Thank you.