Error when setting streammux's batchsize >1 for custom model

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)= 1080ti
• DeepStream Version=6.0
• TensorRT Version=8.2.0.6
• NVIDIA GPU Driver Version (valid for GPU only)= 11.4
• Issue Type( questions, new requirements, bugs)

Hi there! I hope you are doing well.

I have implemented the retinaface detector on a 1080ti, using deepstream. Retinaface is not just a detector. for every bounding box, it also outputs 5 landmarks. Therefore, instead of writing a parser function for it, I followed the example of “deepstream-infer-tensor-meta-test” and set network-type=100 and output-tensor-meta=1. this allows me to access the output tensor meta in the pgie probe, where a custom nms function does the post-processing task. then, I add the bounding box of each object to obj-meta ( where obj-meta is defined as: NvDsObjectMeta obj_meta = nvds_acquire_obj_meta_from_pool (batch_meta);
) and add the location coordinates of its five landmarks to the user-meta of obj_meta (where I define it as: NvDsUserMeta
um1 = nvds_acquire_user_meta_from_pool(batch_meta); )

and finally the two following commands are executed for each object to add the custom metadata to the pipeline:

nvds_add_user_meta_to_obj(obj_meta, um1);
nvds_add_obj_meta_to_frame (frame_meta, obj_meta, NULL);

later in pipeline, I can draw the landmarks in osd probe.

So far every thing works perfectly on a single video stream and I can see the detected faces and the landmarks on them in the output as you see below:

the problem however starts when I want to run it on multiple rtsp streams following deepstream’s example on multi-rtsp pipeline.

in the multi-rtsp setting, I set the batch-size of streammux as num-sources and the batch-size of pgie as 1. here is what happens then:

1-for multiple rtsp cams with the batch-size of streammux and pgie both set to 1, every thing works correctly, but the speed is low.

2- if I set the batch-size of streammux higher than 6, deepstream’s window freezes.

3- if I set the batch-size of streammux as: [1<batch-size<=6 ], and the batch-size_of_pgie =1, then it can run on multiple videos (even if the number of videos is higher than 6). its speed is better that if the batch size of streammux was set to 1. but, another problem arises in this scenario: detections of different videos get mixed. take a look at the below image:

Also, If I deactivate the two probes (i.e. the pgie probe where custom post processing and attachment of custom metadata is preformed , and the osd probe where the landmarks are drawn), still the pipeline cannot work for the batch-size of streammux>6. if I remove the pgie plugin from the pipeline (i.e. from the gst_element_link_many function), the pipeline then works perfectly, with a good speed for all values of streammux’s batch-size, even when there are 60 videos ( ofcourse no detections this time).

here is my pipeline after the streammux: streammux->pgie->nvvidconv->nvosd->tiler ->sink

what is going wrong? how can I make my code work correctly and fast for multiple rtsp streams?

What is the difference between batch-size of nvstreammux and nvinfer? What are the recommended values for nvstreammux batch-size?

nvstreammux’s batch-size is the number of buffers(frames) it will batch together in one muxed buffer. nvinfer’s batch-size is the number of frame(primary-mode)/objects(secondary-mode) it will infer together. We recommend that the nvstreammux’s batch-size be set to either number of sources linked to it or the primary nvinfer’s batch-size.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html

Dear kesong,

Thanks for your swift reply.
I found the reason why the pipeline crashes when the number of input sources is higher that six.

While building the Tensorrt engine, I increased the max_batch_size to 100. It now works on any number of input cameras. typically, I set both the batch size of streammux, and the batchsize of nvinfer to be equal to the number of sources.

But the second problem (keypoints being drawn in the wrong location) still remains. which I think is a result of the tiler’s scalings being done for the bounding boxes, but not for the keypoints. so, I’ll implement it myself.

Thank you again.

So you resolved your issue. right?

yes!
you can close the topic now!

Thanks