BatchedNMS and BatchedNMSDynamic plugins have different dimensions for num_detections output

• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 6.0
• TensorRT Version: 8.0.1
• NVIDIA GPU Driver Version (valid for GPU only): 495.29.05
• Issue Type( questions, new requirements, bugs): questions, bugs

Hi,
I can successfully create TRT engine from yolov4 onnx model containing BatchedNMSDynamic plugin. I add this plugin by this script. Now I want to use BatchedNMS plugin for static batch size instead of BatchedNMSDynamic to compare them in terms of speed. According the docs, both plugins have the same outputs and these are num_detections, nmsed_boxes, nmsed_scores and nmsed_classes.

The creation of the TRT engine from onnx model containing BatchedNMS plugin by trtexec is ok, but when I try to use it in the Deepstream pipeline, it gives me following error:

0   INPUT  kFLOAT input           3x544x544       
1   OUTPUT kINT32 num_detections  0               
2   OUTPUT kFLOAT nmsed_boxes     1000x4          
3   OUTPUT kFLOAT nmsed_scores    1000            
4   OUTPUT kFLOAT nmsed_classes   1000 

main: nvdsinfer_context_impl.cpp:1412: NvDsInferStatus nvdsinfer::NvDsInferContextImpl::allocateBuffers(): Assertion `bindingDims.numElements > 0' failed.
Aborted (core dumped)

I do not know why num_detections is 0. Only what I found from trtexec logs, that both TRT engines have different dimension for num_detections outputs:

Logs from trtexec during creation of TRT model with BatchedNMS plugin:
input 8x3x544x544
num_detections with dimensions 8
nmsed_boxes with dimensions 8x1000x4
nmsed_scores with dimensions 8x1000
nmsed_classes with dimensions 8x1000

Logs from trtexec during creation of TRT model with BatchedNMSDynamic plugin:
input 8x3x544x544
num_detections with dimensions 8x1
nmsed_boxes with dimensions 8x1000x4
nmsed_scores with dimensions 8x1000
nmsed_classes with dimensions 8x1000

And it is weird!

I also checked both onnx models containing BatchedNMS plugin and BatchedNMSDynamic plugin with Netron and the outputs of both models are completely same.

Onnx model with BatchedNMS plugin:
Screenshot from 2022-01-17 18-49-32
Onnx model with BatchedNMSDynamic plugin:
Screenshot from 2022-01-17 19-23-15

It seems to me like a bug in the BatchedNMS. Do you have any idea?

Hi @fre_deric ,
Could you share how to change to BatchedNMS from BatchedNMSDynamic ?

Thanks!

Hi @mchi,
I add both plugins by this script. The script takes onnx model as input and it adds some plugin to the end of the onnx model.

To add BatchedNMS I modify the script like this:

 mns_node = gs.Node(
        op="BatchedNMS_TRT",
        attrs=create_attrs(input_h, input_w, topK, keepTopK),
        inputs=[boxes_tensor, confs_tensor],
        outputs=new_outputs)

To add BatchedNMSDynamic I modify the script like this:

mns_node = gs.Node(
        op="BatchedNMSDynamic_TRT",
        attrs=create_attrs(input_h, input_w, topK, keepTopK),
        inputs=[boxes_tensor, confs_tensor],
        outputs=new_outputs)

Thanks!

is num_detections one of your output layer of your model?

Yes it is. It is shown here:

Hitting this EXACT same problem when on TRT 8.0.1, Jetpack 4.6, DeepStream 6.0.

When placing a BatchedNMS head on a Yolov5 model DeepStream fails to start (showing 0 dimension for num_detections). When using a DynamicBatchedNMS head, everything runs and loads great.

2 Likes

HI @nave.assaf @fre_deric
Is it possible to share the two models & DeepStream samples with private message?

Thanks!

2 Likes

I ran on the DS deepstream_test1_rtsp_out.py app. I compiled the attached models with trtexec I wrote my own custom parser which is irrelevant to this issue as the DS app crashes during the loading of the model in the nvinfer plugin (before the parser is even envoked). Attached below are 3 model ONNX files. I know theyre not exactly the same model, but they demonstrate the issue:

yoloxN_dynamic_640.onnx - Original model.
yoloxS_640_batched_nms_dynamic.onnx - DynamicBatchedNMS model. WORKS
yoloxN_dynamic_640_batched_nms_bs1.onnx - BatchedNMS model. FAILS

Example of the error I am seeing (not specifically for the models here, but for any static BatchedNMS):

0   INPUT  kFLOAT input           3x544x544       
1   OUTPUT kINT32 num_detections  0               
2   OUTPUT kFLOAT nmsed_boxes     1000x4          
3   OUTPUT kFLOAT nmsed_scores    1000            
4   OUTPUT kFLOAT nmsed_classes   1000 

main: nvdsinfer_context_impl.cpp:1412: NvDsInferStatus nvdsinfer::NvDsInferContextImpl::allocateBuffers(): Assertion `bindingDims.numElements > 0' failed.

BatchedNMSVsDynamicNMS.zip (43.0 MB)

Hi @mchi ,
Here is zip file containing models. There is official yolov4-csp weight with config file taken from https://github.com/AlexeyAB/darknet repo. The weight and the config file were used to generate onnx models.

To generate onnx model from Darknet weight and config, I used this repo.

To add both plugins (BatchedNMSDynamic_TRT, BatchedNMS_TRT) I used this script.

To generate engine mode, the trtexec was used.

In dynamic_nms_batched folder there are onnx and engine models with BatchedNMSDynamic_TRT plugin. You can run it with deepstream-app, it works.

In static_nms_batched folder there are onnx and engine models with BatchedNMS_TRT plugin. This engine model does not work and it throws the known error (you can try by deepstream-app):

main: nvdsinfer_context_impl.cpp:1412: NvDsInferStatus nvdsinfer::NvDsInferContextImpl::allocateBuffers(): Assertion `bindingDims.numElements > 0' failed.
Aborted (core dumped)

Thanks

Hi, it’s Lynette from @mchi’s team, the issue can be reproduced, thanks for the sharing. Here is the patch to modify the output dimension:

diff --git a/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp b/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp
index 3d09910..842a3f2 100644
--- a/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp
+++ b/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp
@@ -159,7 +159,8 @@ Dims BatchedNMSPlugin::getOutputDimensions(int index, const Dims* inputs, int nb
         if (index == 0)
         {
             Dims dim0{};
-            dim0.nbDims = 0;
+            dim0.nbDims = 1;
+            dim0.d[0] = 1;
             return dim0;
         }
         // nmsed_boxes

You may patch it to TensorRT OSS and rebuild the plugin library.

$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/
$ git submodule update --init --recursive
$ mkdir -p build && cd build
$ cmake .. -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu -DTRT_OUT_DIR=`pwd`/out
// modify the 162 line of plugin/batchedNMSPlugin/batchedNMSPlugin.cpp
$ make plugin -j$(nproc)
// Replace the existed plugin library
$ sudo cp libnvinfer_plugin.so.8.*.* /lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.*.*

Could you have a try and feedback to us? Thanks~

2 Likes

Thanks @lynettez , I will try this week. I am going to inform you.

Hi @lynettez ,
It works, thank you!

However the generated engine file with BatchedNMSDynamic_TRT is faster than the engine file with BatchedNMS_TRT. I did not expect this.

By faster I mean following:

  • yolo4-csp with batch=4 with BatchedNMSDynamic_TRT gives me: 130FPS for each input stream, in total 520FPS (total=4*130)
  • yolo4-csp with batch=4 with BatchedNMS_TRT gives me 118FPS for each input stream, in total 472FPS (4*118)

In both cases the GPU was utilized for 100% and no bottleneck was there, so it seems to me weird that BatchedNMSDynamic_TRT is really faster.

Thank you for your time!

can you use command below to dump the per layer time and share with us the log?

/usr/src/tensorrt/bin/trtexec --loadEngine=TRT_Engine_generated_in_DeepStream --dumpProfile

1 Like

Ok, I did it again and now the models have the same FPS (around 4 * 130FPS). It was probably caused by wrong version of tensorrt (my version is 8.0 and I put there TensorRT OSS version of 8.2).

The logs are here:
yolov4-csp_b4_fp16_static_batch_nms.txt (72.4 KB)

yolov4-csp_b4_fp16_dynamic_batch_nms.txt (72.6 KB)

The node_of_num_detections is worth noting because its time is more than 500ms for both models.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.