Batch size

I have a few questions about batch size:

  • Is there a point (performance, for example) to a batch size > 1 for a single camera source? Has this been benchmarked?
  • I'm currently planning on parsing the detection metadata with a probe attached to a fakesink pad and this function here:
  • // many thanks to NVIDIA's test1_app for showing me how to do this right
    GstPadProbeReturn on_batch(GstPad* pad,
                               GstPadProbeInfo* info,
                               gpointer u_data) {
      // get batched metadata:
      NvDsBatchMeta* batch = gst_buffer_get_nvds_batch_meta((GstBuffer*)info->data);
    
      NvDsMetaList* frames = NULL;
      NvDsFrameMeta* frame = NULL;
      NvDsMetaList* objects = NULL;
      NvDsObjectMeta* object = NULL;
    
      // for frame in batch.frame_meta_list:
      for (frames = batch->frame_meta_list; frames != NULL; frames = frames->next) {
        frame = (NvDsFrameMeta*)(frames->data);
    
        // for object in frame.obj_meta_list:
        for (objects = frame->obj_meta_list; objects != NULL;
             objects = objects->next) {
          object = (NvDsObjectMeta*)(objects->data);
    
          if (object->class_id == BIRB_ID) {
            dump_bbox(frame->frame_num, &object->rect_params);
          }
        }
      }
    }
    

    “dump_bbox” is currently set to print to stdio, but I will soon send the data to be dumped to sidecar file in batches instead of printed. My question is: is there a huge disadvantage to doing what I want this way as opposed to using nvmsgconv?

    Hi,

    1.
    Sorry that we don’t have benchmark results of batchsize with deepstream.
    But here is the profiling data with TensorRT cross batchsize.
    https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks

    2.
    Suppose not.

    Thanks.

    @mdegans

    Is there a point (performance, for example) to a batch size > 1 for a single camera source? Has this been benchmarked? >> We have not benchmarked such a use-case, however if you are trying this approach i would suggest these changes.

    Temporal batching for a single camera source makes sense only when your camera FPS is lower than 30. To do so, you can increase the “batch-timeout” property on the streammux to wait until you have the required number of frames in batch. Also make sure to set your streammux and pgie batch-size are the same. Keep in mind with this approach the display on the sink wont be smooth when compared to non-batched pipeline.

    Thanks for the suggestion. That’s very useful since the two branches of my pipeline (encoder and inference) after my tee don’t need to resync and I want to write the metadata in chunks. I will experiment with batches and make sure all my elements in my inference branch use the same batch size.