Potential bug in optimized dsexample plugin

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): GeForce RTX 3090
• DeepStream Version: 6.1
• TensorRT Version: 8.2
• NVIDIA GPU Driver Version (valid for GPU only): 510
• Issue Type( questions, new requirements, bugs): bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing): NA

In the file gstdsexample_optimized.cpp, in gst_dsexample_submit_input_buffer(), the batch size can never reach the max_batch_size when process_full_frame is set to false.

/**
 * Called when element recieves an input buffer from upstream element.
 */
static GstFlowReturn
gst_dsexample_submit_input_buffer (GstBaseTransform * btrans,
    gboolean discont, GstBuffer * inbuf)
{
   // some code

   num_filled = batch_meta->num_frames_in_batch; // up to the number of sources in the pipeline

   if (dsexample->process_full_frame)
   {
     // process full frame
   } else{
     // process detected object

     i++;

     // Convert batch and push to process thread
     if (batch->frames.size () == dsexample->max_batch_size || i == num_filled) {
      // send full-batch
     }
   }
}

When I set process_full_frame to false and max_batch_size to 4, batch size can only reach 1 (the number of sources in the pipeline) before a “full-batch” is submitted, even though there are enough detected objects to fill a full batch of size 4.

I added print statements:

g_message("Send full batch! batch-size = %ld, max-batch-size = %d, i= %d, num_filled= %d", 
        batch->frames.size(), dsexample->max_batch_size, i, num_filled);

g_message("Send non-full batch! batch-size = %ld", batch->frames.size ());

And run ./deepstream-opencv-test file://$HOME/Downloads/deepstream-6.1/samples/streams/sample_720p.mp4, the output:

Message: 13:04:42.466: Send full batch! batch-size = 1, max-batch-size = 4, i= 1, num_filled= 1

I don’t see it mentioned anywhere that the max-batch-size must be less than or equal to batch_meta->num_frames_in_batch when we process detected objects instead of full-frame? What is the point of allocate enough memory to hold max-batch-size when creating dsexample->inter_buf and dsexample->batch_insurf but they will only fill up to the number of sources in the pipeline in non-full-frame mode? Is this expected behaviour when we process detected objects instead of full-frame?

Modified files:

deepstream_opencv_test.c (14.9 KB)
gstdsexample_optimized.cpp (46.1 KB)

I think when process detected objects instead of this code segment:

/* Adding a frame to the current batch. Set the frames members. */
GstDsExampleFrame frame;
frame.scale_ratio_x = scale_ratio;
frame.scale_ratio_y = scale_ratio;
frame.obj_meta = obj_meta;
frame.frame_meta = nvds_get_nth_frame_meta (batch_meta->frame_meta_list, i);
frame.frame_num = frame.frame_meta->frame_num;
frame.batch_index = i;
frame.input_surf_params = in_surf->surfaceList + i;
batch->frames.push_back (frame);

i++;

// Convert batch and push to process thread
if (batch->frames.size () == dsexample->max_batch_size || i == num_filled)
{
   // code
}

To allow the batch to reach max_batch_size, it should be:

/* Adding a frame to the current batch. Set the frames members. */
GstDsExampleFrame frame;
frame.scale_ratio_x = scale_ratio;
frame.scale_ratio_y = scale_ratio;
frame.obj_meta = obj_meta;
frame.frame_meta = nvds_get_nth_frame_meta (batch_meta->frame_meta_list, frame_meta->batch_id);
frame.frame_num = frame.frame_meta->frame_num;
frame.batch_index = frame_meta->batch_id;
frame.input_surf_params = in_surf->surfaceList + frame_meta->batch_id;
batch->frames.push_back (frame);

i++; // not needed in process detected objects mode

// Convert batch and push to process thread
if (batch->frames.size () == dsexample->max_batch_size)
{
   // code
}

about " the batch size can never reach the max_batch_size when process_full_frame is set to false.", if process_full_frame is false,
the workflow will be processing objects, maybe object number is bigger than max_batch_size, in this case, batch-size is max_batch_size.

My understanding of the intended optimisation being done here is:

Say, there are 2 cameras, if there are 10 objects detected in total in a batched buffer of 2 frames and max_batch_size is set to 4 , then there will be 2 batches of size 4 (full batch) and 1 batch of size 2 (non-full batch) submitted, however, the current implementation will submit 5 batches of size 2 instead due to num_filled = batch_meta->num_frames_in_batch; and this check i == num_filled which defeat the purpose of setting max_batch_size. If everything else remain the same but the number of camera is deceased to 1, then 10 batches of size 1 are submitted. So the number of sources in the pipeline (which ultimately set batch_meta->num_frames_in_batch) become the main limiting factor of concurrency, which make sense in process full-frames mode but doesn’t make much sense in process objects mode. My question, is this expected behaviour when we process objects instead of full-frame? I also suggested a code change which allows full batches to be submitted when there are enough objects, I’m just not sure if I misunderstand the intention of the current implementation, a.k.a. is this a feature or a bug?

@fanzh,

Hi, is there any update on my follow-up questions?

it should be a bug, here should remove “i == num_filled” limitation, GPU prefers full batches processing. does it run ok after modification?

@fanzh,

Thank for the reply! Removing i == num_filled caused Segmentation fault due to how frames (GstDsExampleFrame) are added to the batch (GstDsExampleBatch) using the i variable, switch from i to using frame_meta->batch_id and remove i == num_filled allow the batch to reach max-batch-size if there are enough detected objects.

Current batch filling:

GstDsExampleFrame frame;
frame.scale_ratio_x = scale_ratio;
frame.scale_ratio_y = scale_ratio;
frame.obj_meta = obj_meta;
frame.frame_meta = nvds_get_nth_frame_meta (batch_meta->frame_meta_list, i); // Segmentation fault when i > batch_meta->num_frames_in_batch
frame.frame_num = frame.frame_meta->frame_num;
frame.batch_index = i;
frame.input_surf_params = in_surf->surfaceList + i; 
batch->frames.push_back (frame);

Modified batch filling:

GstDsExampleFrame frame;
frame.scale_ratio_x = scale_ratio;
frame.scale_ratio_y = scale_ratio;
frame.obj_meta = obj_meta;
frame.frame_meta = nvds_get_nth_frame_meta (batch_meta->frame_meta_list, frame_meta->batch_id);
frame.frame_num = frame.frame_meta->frame_num;
frame.batch_index = frame_meta->batch_id;
frame.input_surf_params = in_surf->surfaceList + frame_meta->batch_id;
batch->frames.push_back (frame);

thanks for your sharing, this dsexample plugin is opensource, and will not be maintained, please modify it as needed.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.