How to set nvyolo 's batch size?

Hello,
I use
gst-inspect-1.0 nvyolo

command to show which property I could set,
it shows
{
name,
parent,
qos,
unique-id ,
processing-width,
processing-height,
full-frame,
gpu-id

}

emm, I think there should be a batch-size property,

So I open the gst-yoloplugin-tesla.h and gst-yoloplugin-tesla.cpp in directory “deepstream-plugins/sources/plugins/gst-yoloplugin-tesla”

there is :

static gboolean
gst_yoloplugin_start (GstBaseTransform * btrans)
{
  GstYoloPlugin *yoloplugin = GST_YOLOPLUGIN (btrans);
  YoloPluginInitParams init_params =
      { yoloplugin->processing_width, yoloplugin->processing_height,
    yoloplugin->process_full_frame
  };

  GstQuery *queryparams = NULL;
  guint batch_size = 1;
  cudaError_t CUerr = cudaSuccess;

  yoloplugin->batch_size = 1;
  queryparams = gst_nvquery_batch_size_new ();
  if (gst_pad_peer_query (GST_BASE_TRANSFORM_SINK_PAD (btrans), queryparams)
      || gst_pad_peer_query (GST_BASE_TRANSFORM_SRC_PAD (btrans), queryparams)) {
    if (gst_nvquery_batch_size_parse (queryparams, &batch_size)) {
      yoloplugin->batch_size = batch_size;
    }
  }
  GST_DEBUG_OBJECT (yoloplugin, "Setting batch-size %d \n",
      yoloplugin->batch_size);
  gst_query_unref (queryparams);

  /* Algorithm specific initializations and resource allocation. */
  yoloplugin->yolopluginlib_ctx =
      YoloPluginCtxInit (&init_params, yoloplugin->batch_size);

  GST_DEBUG_OBJECT (yoloplugin, "ctx lib %p \n", yoloplugin->yolopluginlib_ctx);
  CUerr = cudaSetDevice (yoloplugin->gpu_id);
  if (CUerr != cudaSuccess) {
    g_print ("\n *** Unable to set device in %s Line %d\n", __func__, __LINE__);
    goto error;
  }

it seems that the batchsize is always to be 1.
I wonder if it’s neccessary to set batchsize = 1,
could I set it to 2 or 4 or 16 ?

Thanks for your time.

The plugin currently queries the pipeline upstream to figure out the batch-size with 1 being the minimum. So if you use two sources in your pipeline, the yolo plugin would use batch-size 2 and create an engine for the same. Currently, there is no support for temporal batching, i.e using batch sizes greater than 1 with just a single source.

@NvCJR
wow!
Your reply is very clearly,
Thanks for your answer,
it could save me a lot time.

You can use streammux to perform temporal batching

Hi NvCJR, does the batch size need to equal the number of streams exactly? If I have 2 streams being processed does the engine need to be built with a batch size of 2? What are the consequences of building an engine with batch size of 4 and only processing 2 streams?

What I’m getting at is I want to have the engines prebuilt for deployment, but I cannot always be sure how many streams will be loaded. So am I best off pre-building an array of engines set for different batch sizes?

Also, can you explain temporal batching please?

What are the consequences of building an engine with batch size of 4 and only processing 2 streams?

You can process 2 streams with an engine that has been built with max batch size of 4. Although it depends on the network and how TRT optimizes it, you shouldn’t see a big diff in perf if you use a lower batch size than what the engine was built for.

What I’m getting at is I want to have the engines prebuilt for deployment, but I cannot always be sure how many streams will be loaded. So am I best off pre-building an array of engines set for different batch sizes?

You can try a simple experiment to check if this affects you. Build an engine for batch size 128 (or a num suitable for you use case). Check what’s the perf if you use it for the lowest possible batch size in your use case (or batch size 1). You can then check perf with an engine with the lowest possible batch size too(like 1), and compare the difference. From an accuracy point of the view, there should be no change.

Also, can you explain temporal batching please?

For a single stream use case, you can set streammux timeout property to -1, to fill a batch completely before pushing it downstream. So if you set batch-size to 4 when only a single stream is being used, the muxer will wait until 4 frames have arrived before pushing the batch downstream.