Potential bug in optimized dsexample plugin

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): GeForce RTX 3090
• DeepStream Version: 6.1
• TensorRT Version: 8.2
• NVIDIA GPU Driver Version (valid for GPU only): 510
• Issue Type( questions, new requirements, bugs): bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing): NA

In the file gstdsexample_optimized.cpp, in gst_dsexample_submit_input_buffer(), the batch size can never reach the max_batch_size when process_full_frame is set to false.

/**
 * Called when element recieves an input buffer from upstream element.
 */
static GstFlowReturn
gst_dsexample_submit_input_buffer (GstBaseTransform * btrans,
    gboolean discont, GstBuffer * inbuf)
{
   // some code

   num_filled = batch_meta->num_frames_in_batch; // up to the number of sources in the pipeline

   if (dsexample->process_full_frame)
   {
     // process full frame
   } else{
     // process detected object

     i++;

     // Convert batch and push to process thread
     if (batch->frames.size () == dsexample->max_batch_size || i == num_filled) {
      // send full-batch
     }
   }
}

When I set process_full_frame to false and max_batch_size to 4, batch size can only reach 1 (the number of sources in the pipeline) before a “full-batch” is submitted, even though there are enough detected objects to fill a full batch of size 4.

I added print statements:

g_message("Send full batch! batch-size = %ld, max-batch-size = %d, i= %d, num_filled= %d", 
        batch->frames.size(), dsexample->max_batch_size, i, num_filled);

g_message("Send non-full batch! batch-size = %ld", batch->frames.size ());

And run ./deepstream-opencv-test file://$HOME/Downloads/deepstream-6.1/samples/streams/sample_720p.mp4, the output:

Message: 13:04:42.466: Send full batch! batch-size = 1, max-batch-size = 4, i= 1, num_filled= 1

I don’t see it mentioned anywhere that the max-batch-size must be less than or equal to batch_meta->num_frames_in_batch when we process detected objects instead of full-frame? What is the point of allocate enough memory to hold max-batch-size when creating dsexample->inter_buf and dsexample->batch_insurf but they will only fill up to the number of sources in the pipeline in non-full-frame mode? Is this expected behaviour when we process detected objects instead of full-frame?

Modified files:

deepstream_opencv_test.c (14.9 KB)
gstdsexample_optimized.cpp (46.1 KB)

I think when process detected objects instead of this code segment:

/* Adding a frame to the current batch. Set the frames members. */
GstDsExampleFrame frame;
frame.scale_ratio_x = scale_ratio;
frame.scale_ratio_y = scale_ratio;
frame.obj_meta = obj_meta;
frame.frame_meta = nvds_get_nth_frame_meta (batch_meta->frame_meta_list, i);
frame.frame_num = frame.frame_meta->frame_num;
frame.batch_index = i;
frame.input_surf_params = in_surf->surfaceList + i;
batch->frames.push_back (frame);

i++;

// Convert batch and push to process thread
if (batch->frames.size () == dsexample->max_batch_size || i == num_filled)
{
   // code
}

To allow the batch to reach max_batch_size, it should be:

/* Adding a frame to the current batch. Set the frames members. */
GstDsExampleFrame frame;
frame.scale_ratio_x = scale_ratio;
frame.scale_ratio_y = scale_ratio;
frame.obj_meta = obj_meta;
frame.frame_meta = nvds_get_nth_frame_meta (batch_meta->frame_meta_list, frame_meta->batch_id);
frame.frame_num = frame.frame_meta->frame_num;
frame.batch_index = frame_meta->batch_id;
frame.input_surf_params = in_surf->surfaceList + frame_meta->batch_id;
batch->frames.push_back (frame);

i++; // not needed in process detected objects mode

// Convert batch and push to process thread
if (batch->frames.size () == dsexample->max_batch_size)
{
   // code
}

about " the batch size can never reach the max_batch_size when process_full_frame is set to false.", if process_full_frame is false,
the workflow will be processing objects, maybe object number is bigger than max_batch_size, in this case, batch-size is max_batch_size.

My understanding of the intended optimisation being done here is:

Say, there are 2 cameras, if there are 10 objects detected in total in a batched buffer of 2 frames and max_batch_size is set to 4 , then there will be 2 batches of size 4 (full batch) and 1 batch of size 2 (non-full batch) submitted, however, the current implementation will submit 5 batches of size 2 instead due to num_filled = batch_meta->num_frames_in_batch; and this check i == num_filled which defeat the purpose of setting max_batch_size. If everything else remain the same but the number of camera is deceased to 1, then 10 batches of size 1 are submitted. So the number of sources in the pipeline (which ultimately set batch_meta->num_frames_in_batch) become the main limiting factor of concurrency, which make sense in process full-frames mode but doesn’t make much sense in process objects mode. My question, is this expected behaviour when we process objects instead of full-frame? I also suggested a code change which allows full batches to be submitted when there are enough objects, I’m just not sure if I misunderstand the intention of the current implementation, a.k.a. is this a feature or a bug?

@fanzh,

Hi, is there any update on my follow-up questions?

it should be a bug, here should remove “i == num_filled” limitation, GPU prefers full batches processing. does it run ok after modification?

@fanzh,

Thank for the reply! Removing i == num_filled caused Segmentation fault due to how frames (GstDsExampleFrame) are added to the batch (GstDsExampleBatch) using the i variable, switch from i to using frame_meta->batch_id and remove i == num_filled allow the batch to reach max-batch-size if there are enough detected objects.

Current batch filling:

GstDsExampleFrame frame;
frame.scale_ratio_x = scale_ratio;
frame.scale_ratio_y = scale_ratio;
frame.obj_meta = obj_meta;
frame.frame_meta = nvds_get_nth_frame_meta (batch_meta->frame_meta_list, i); // Segmentation fault when i > batch_meta->num_frames_in_batch
frame.frame_num = frame.frame_meta->frame_num;
frame.batch_index = i;
frame.input_surf_params = in_surf->surfaceList + i; 
batch->frames.push_back (frame);

Modified batch filling:

GstDsExampleFrame frame;
frame.scale_ratio_x = scale_ratio;
frame.scale_ratio_y = scale_ratio;
frame.obj_meta = obj_meta;
frame.frame_meta = nvds_get_nth_frame_meta (batch_meta->frame_meta_list, frame_meta->batch_id);
frame.frame_num = frame.frame_meta->frame_num;
frame.batch_index = frame_meta->batch_id;
frame.input_surf_params = in_surf->surfaceList + frame_meta->batch_id;
batch->frames.push_back (frame);

thanks for your sharing, this dsexample plugin is opensource, and will not be maintained, please modify it as needed.