Saving cropped images with nvds_obj_enc API degrade performance significantly

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): GeForce 4090
• DeepStream Version: 6.2
• TensorRT Version: 8.5
• NVIDIA GPU Driver Version (valid for GPU only): 525
• Issue Type( questions, new requirements, bugs): questions

My pipeline looks like this:

camera sources (3) → streammuxer → detector (1) → classifiers (7) → tee → queue → app_sink

I attached the following probe to the queue sink pad:

GstPadProbeReturn encode_image_buffer_probe (
    GstPad * pad, 
    GstPadProbeInfo * info, 
    gpointer ctx)
{
    GstBuffer *buf = (GstBuffer *) info->data;

    GstMapInfo inmap = GST_MAP_INFO_INIT;
    if (!gst_buffer_map (buf, &inmap, GST_MAP_READ)) {
        GST_ERROR ("input buffer mapinfo failed");
        return GST_FLOW_ERROR;
    }
    NvBufSurface *ip_surf = (NvBufSurface *) inmap.data;
    gst_buffer_unmap (buf, &inmap);

    NvDsObjectMeta *obj_meta = NULL;
    NvDsMetaList *l_frame = NULL;
    NvDsMetaList *l_obj = NULL;
    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);

    for (l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) {

      NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);

      for (l_obj = frame_meta->obj_meta_list; l_obj != NULL; l_obj = l_obj->next) {

        obj_meta = (NvDsObjectMeta *) (l_obj->data);
        
        if (obj_meta->class_id == 1)
        {
          NvDsObjEncUsrArgs userData = { 0 };
          /* To be set by user */
          userData.saveImg = false;
          userData.attachUsrMeta = true;
          /* Set if Image scaling Required */
          userData.scaleImg = FALSE;
          userData.scaledWidth = 0;
          userData.scaledHeight = 0;
          /* Quality */
          userData.quality = 100;
          /*Main Function Call */
          nvds_obj_enc_process (ctx, &userData, ip_surf, obj_meta, frame_meta);
        }
      }
    }
    nvds_obj_enc_finish (ctx);
    return GST_PAD_PROBE_OK;
}

The app_sink has a dummy callback function that just pulls the sample, immediately releases it and returns. I collect latency for each elements and plot them. Here is the results:










As you can see, without the encode_image_buffer_probe, all elements had significantly lower latency. The extreme latency happened when there are a lot of objects to encode. It seems to me that whatever nvds_obj_enc_process and nvds_obj_enc_finish is doing under the hood, they bottleneck the pipeline performance significantly. What should I do to mitigate this issue?

We are aware of this issue and working on it already, will keep you posted.

1 Like

This is a known issue. We are working on this issue.

1 Like

Hi @Fiona.Chen , what is the estimation of time for the next DS release with this fix? Is it coming in next 30 days?

When the release is ready, we will announce. It is not allowed to talk about our internal plan.

how did you collect latency of all the elements ?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

DeepStrean can only measure DeepStream elements’ latencies. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.