Frame rate drops when saving jpg files in Deepstream 6.2 SDK

lin.bruno · March 9, 2023, 5:59am

Do you mean trt file?
Is it related to jpegencode issue?

Fiona.Chen · March 9, 2023, 6:12am

We want to simulate the same loading as your case.

kpernos9 · March 9, 2023, 7:10am

This is our model file sample.engine (12.1 MB).

Our model file is an FP16 model that can predict two classes.

kpernos9 · March 13, 2023, 9:59am

This application is available for you to test, and it also has problems with frame rate drops.

The config file used by this application is “source1_usb_dec_infer_resnet_int8_for_nv.txt”, which is modified from the sample file “source1_usb_dec_infer_resnet_int8.txt”.

Please place the “source1_usb_dec_infer_resnet_int8_for_nv.txt” file in the same location as the sample file, and then run the application.

deepstream-app-for-nvidia-test.7z (164.7 KB)

By the way, the testing environment is the same as previously mentioned.

yingliu · March 14, 2023, 8:42am

Hi @kpernos9 ,
We are checking now, and need some time to come back, thank you.

Fiona.Chen · March 15, 2023, 8:02am

I’ve added some nvds_obj_enc_process() performance measurement code in the deepstream_app.c
deepstream_app.c (58.2 KB)

With enabling the max power of the Orin NX board(Performance — DeepStream 6.2 Release documentation) and max out the clocks (VPI - Vision Programming Interface: Performance Benchmark), the nvds_obj_enc_process() time for one frame is about 1.3ms, it will not impact the FPS too much. Can you try to measure the encoding time on your board?

rsc44 · March 17, 2023, 3:53am

@Fiona.Chen please tell us how many objects were in the feed during your tests. As i do not get anywhere close to that performance.

If you had less than 5 objects, please run the test again for a real world scenario that has 30+ objects per feed.

Also i’d like to point out, that 1.3ms per frame, will definitely affect performance if you have more than 1 feeds running. As the probe call will block until it is finished saving the frames from all feeds, and cropping the objects for all feeds.

In this situation, say we have 8 feeds, running at 15fps, each full frame process takes around 1.3ms, so right off the bat we are blocking for 9.6ms, then we have to crop/encode each object. If we have 30 objects per frame, which is perfectly reasonable, then we need to add 0.5-1ms to our total for each object. In this example, we are blocking for over 33ms per frame !

That means the total probe blocking time to crop all frames and objects from the batch would be upwards of 240ms, even if you don’t save the full frame, and you just crop objects your at 207ms, that equates to 4.8 FPS, when it should be running at 15 FPS.

Maybe there’s a better way to do this ? but i am following your lead from your examples.

Here’s a short list the time in milliseconds it takes to crop when using your method.

70.537
51.033
21.606
39.027
0.014
19.9
14.797
0.02
96.307
6.388
12.976
17.777
20.59
13.647
12.293
136.774
0.061
12.563
12.065
17.967
12.524
12.703
9.957
19.649
78.536
31.066
12.679
10.466
13.214
8.899
11.194
8.749
79.924
50.797
24.69
8.634
22.959
7.725
15.38
8.101
111.274
41.597
34.405
19.539
19.886
13.751
20.182
20.575
135.396
28.206
59.725
26.235

The output here is from 6 feeds(15fps) on an AGX xavier (maxn, clocks, etc) with peoplenet, we are only cropping and encoding in this run, and not encoding the full frame there are just a few detections in each frame, because it is the middle of the night.

During the day, the performance is absolutely atrocious.

In this example, whenever the probe took longer than the batchpushout time on streammux, we lose fps, and down stream elements become starved, whilst upstream elements become full. So it evolves into a much larger problem than just losing 50ms, suddenly we are dropping random buffers, losing data, and every element in the pipeline losses its ability to operate at the marketed performance nvidia promotes deepstream at.

Fiona.Chen · March 17, 2023, 5:41am

@rsc44 Are you talking about the same issue as @kpernos9 ? If it is not, please raise your own topic, thank you!

kpernos9 · March 17, 2023, 8:57am

Previously, the nvds_obj_enc_process() took around 2.2ms for one frame, when I max out the clocks, the nvds_obj_enc_process() time for one frame is about 0.8ms on my board.

But I found out that the issue lies with the nvds_obj_enc_finish(), which takes up to 325ms when saving one frame.

I added the following code in this program to calculate the time.

  struct timeval t3,t4,tresult2;
  double timeuse2;
  gettimeofday(&t3, NULL);

  nvds_obj_enc_finish (appCtx->obj_ctx_handle_);

  gettimeofday(&t4, NULL);
  timersub(&t4, &t3, &tresult2);
  timeuse2 = tresult2.tv_sec*1000 + (1.0 * tresult2.tv_usec)/1000;
  g_print("JPG enc finish time %fms\n", timeuse2);

The output log segement is as follows.

JPG enc finish time 0.000000ms
JPG enc finish time 0.000000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.000000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.000000ms
JPG enc finish time 0.002000ms
JPG enc finish time 0.002000ms
JPG enc time 0.897000ms
JPG enc finish time 323.984000ms
JPG enc finish time 0.002000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.002000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.000000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.000000ms
JPG enc finish time 0.001000ms
JPG enc finish time 0.001000ms

rsc44 · March 17, 2023, 8:58am

Yes fiona, the topic is that saving jpgs in the deepstream pipeline causes performance issues if one uses nvds_enc_process.

I decided to further extrapolate the issue, because you provided false information to @kpernos9 saying that it wouldnt cause any FPS drops.

rsc44 · March 17, 2023, 10:21pm

Please move your timing function to after nvds_obj_enc_finish (appCtx->obj_ctx_handle_).

int img_count = 0;
static GstPadProbeReturn
gie_primary_processing_done_buf_prob (GstPad * pad, GstPadProbeInfo * info,
    gpointer u_data)
{
  GstBuffer *buf = (GstBuffer *) info->data;
  AppCtx *appCtx = (AppCtx *) u_data;
  NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);
  if (!batch_meta) {
    NVGSTDS_WARN_MSG_V ("Batch meta not found for buffer %p", buf);
    return GST_PAD_PROBE_OK;
  }

  write_kitti_output (appCtx, batch_meta);
  /* for image save */
  GstMapInfo inmap = GST_MAP_INFO_INIT;
  if (!gst_buffer_map (buf, &inmap, GST_MAP_READ)) {
    GST_ERROR ("input buffer mapinfo failed");
    return GST_PAD_PROBE_DROP;
  }
  NvBufSurface *ip_surf = (NvBufSurface *) inmap.data;
  gst_buffer_unmap (buf, &inmap);
  struct timeval t1,t2,tresult;
  double timeuse;
  gettimeofday(&t1,NULL);
  img_count++;
  char img_path[FILE_NAME_SIZE];
  strncpy(img_path, "./sample_img.jpg", sizeof(img_path) - 1);
  for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) {
    NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);
    if ((img_count % 180) == 0) {        
      img_count = 0;
      NvDsObjectMeta *obj_meta = nvds_acquire_obj_meta_from_pool (batch_meta);
      obj_meta->rect_params.width = ip_surf->surfaceList[0].width;
      obj_meta->rect_params.height = ip_surf->surfaceList[0].height;
      
      obj_meta->rect_params.top = 0;
      obj_meta->rect_params.left = 0;

      NvDsObjEncUsrArgs frameData = {0};
      /* Preset */
       frameData.isFrame = 1;
      /* To be set by user */
      frameData.saveImg = TRUE;
      frameData.attachUsrMeta = TRUE;
      /* Set if Image scaling Required */
      frameData.scaleImg = FALSE;
      frameData.scaledWidth = 0;
      frameData.scaledHeight = 0;
      frameData.objNum = 0;
      snprintf(frameData.fileNameImg, FILE_NAME_SIZE, "%s", img_path);

      nvds_obj_enc_process(appCtx->obj_ctx_handle_, &frameData, ip_surf, NULL, frame_meta);

    }
  }
  nvds_obj_enc_finish (appCtx->obj_ctx_handle_);
  gettimeofday(&t2,NULL);
  timersub(&t2, &t1, &tresult);
  timeuse = tresult.tv_sec*1000 + (1.0 * tresult.tv_usec)/1000;
  g_print("JPG enc time %fms\n", timeuse);
  return GST_PAD_PROBE_OK;
}

nvds_obj_enc_process is a function call that simply pushes the surface, and objmeta to a queue. An underlying function inside your proprietary source code dequeues the buffer and obj meta to do the cropping and encoding.

Due to the fact that API also attaches the jpeg to the metadata, the buffer is blocked from moving to the next element until the crop/encoding is finished for that single batch.

This is accomplished by the function nvds_obj_enc_finish(), which is conceptually just a for while loop waiting on the futures/queue to be completed for that buffer.

rsc44 · March 18, 2023, 8:02am

To follow up,

I spent the day testing and it turns out the probe function that uses nvds_obj_enc_process is around 2x slower than using plain ol’ appsink and opencv(built with cuda) to do the encoding.

This is very, odd, it seems that 6.0 is also affected, not just 6.2.

@Fiona.Chen if nvds_obj_enc_process is designed to run this slow, might it be better to stop recommending people use this in a probe on nvinfer ?

ps, it shouldn’t take a month to answer these questions

Fiona.Chen · March 20, 2023, 3:18am

@kpernos9
Seems nvds_obj_enc_finish () is the bottleneck, we are investigating the root cause now.

lin.bruno · March 28, 2023, 7:18am

@Fiona.Chen
Hi, any update or progress? Still waiting for your response.

Fiona.Chen · March 28, 2023, 7:22am

The bug is fixed. We are testing the patch.

lin.bruno · March 29, 2023, 6:02am

What does your patch fix on? Deepstream? or BSP? or else?

Fiona.Chen · March 30, 2023, 4:31am

The fix changes two parts, one is the jpeg driver in the BSP, the other is the jpeg library in DeepStream.

jan-matthiesen · April 4, 2023, 10:34am

Hi Fiona,

could you please be more precise, especially about:

What version of jetpack / l4t / deepstream will include this fix?
When will the bugfix be released?

Fiona.Chen · April 6, 2023, 2:09am

The patch will be included in the next release.

system · April 20, 2023, 2:10am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deepstream-parallel-infer-app：when one of the stream is disconnected, the stored video and image are inconsistent DeepStream SDK	24	603	June 4, 2024
Issue with Cropping and Saving a Specific Frame Area in .jpg Format DeepStream SDK jetson , deepstream	18	96	March 10, 2025
Nvds_obj_enc_process is freezed implementing in deepstream-app's gie_primary_processing_done_buf_prob function DeepStream SDK	7	296	January 16, 2024
Sending Frames + MetaData (detections + classes + tracking IDs) to Kafka for each stream running DeepStream SDK kafka , deepstream	31	250	January 23, 2025
Calling nvds_obj_enc_finish SIGSEGV DeepStream SDK	11	1014	December 6, 2022
Saving cropped images with nvds_obj_enc API degrade performance significantly DeepStream SDK	7	907	May 9, 2023
Need some tips on debugging deepstream-app-based C++ application DeepStream SDK gstreamer	13	1981	June 1, 2022
Why NvDsUserMetaList is NULL DeepStream SDK	16	331	December 19, 2023
Crop frame and save it in a path DeepStream SDK	10	567	September 26, 2023
How to save frame as jpg and send the filename to kafka in deepstream 5.0 deepstream_test5 DeepStream SDK opencv	31	5846	September 28, 2021

Frame rate drops when saving jpg files in Deepstream 6.2 SDK

Related topics