Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU): GeForce 4090
• DeepStream Version: 6.2
• TensorRT Version: 8.5
• NVIDIA GPU Driver Version (valid for GPU only): 525
• Issue Type( questions, new requirements, bugs): questions
My pipeline looks like this:
camera sources (3) → streammuxer → detector (1) → classifiers (7) → tee → queue → app_sink
I attached the following probe to the queue sink pad:
GstPadProbeReturn encode_image_buffer_probe (
GstPad * pad,
GstPadProbeInfo * info,
gpointer ctx)
{
GstBuffer *buf = (GstBuffer *) info->data;
GstMapInfo inmap = GST_MAP_INFO_INIT;
if (!gst_buffer_map (buf, &inmap, GST_MAP_READ)) {
GST_ERROR ("input buffer mapinfo failed");
return GST_FLOW_ERROR;
}
NvBufSurface *ip_surf = (NvBufSurface *) inmap.data;
gst_buffer_unmap (buf, &inmap);
NvDsObjectMeta *obj_meta = NULL;
NvDsMetaList *l_frame = NULL;
NvDsMetaList *l_obj = NULL;
NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);
for (l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) {
NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);
for (l_obj = frame_meta->obj_meta_list; l_obj != NULL; l_obj = l_obj->next) {
obj_meta = (NvDsObjectMeta *) (l_obj->data);
if (obj_meta->class_id == 1)
{
NvDsObjEncUsrArgs userData = { 0 };
/* To be set by user */
userData.saveImg = false;
userData.attachUsrMeta = true;
/* Set if Image scaling Required */
userData.scaleImg = FALSE;
userData.scaledWidth = 0;
userData.scaledHeight = 0;
/* Quality */
userData.quality = 100;
/*Main Function Call */
nvds_obj_enc_process (ctx, &userData, ip_surf, obj_meta, frame_meta);
}
}
}
nvds_obj_enc_finish (ctx);
return GST_PAD_PROBE_OK;
}
The app_sink has a dummy callback function that just pulls the sample, immediately releases it and returns. I collect latency for each elements and plot them. Here is the results:
As you can see, without the encode_image_buffer_probe
, all elements had significantly lower latency. The extreme latency happened when there are a lot of objects to encode. It seems to me that whatever nvds_obj_enc_process
and nvds_obj_enc_finish
is doing under the hood, they bottleneck the pipeline performance significantly. What should I do to mitigate this issue?