Can nvv4l2decoder and nvstreammux use independent GPU memory?

1002469771 · June 17, 2023, 2:16am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
6.2
• Issue Type( questions, new requirements, bugs)
new requirements
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
According to this page https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html:

If input stream resolution and Gst-nvstreammux resolution (set in the configuration file) are the same, no additional GPU memory is allocated in Gst-nvstreammux. If input stream resolution is not same as Gst-nvstreammux resolution, Gst-nvstreammux allocates memory of size

This can sometimes cause problems.For instance, consider a pipeline like this:

32X rtspsrc ! nvv4l2decoder ! nvstreammux ! queue ! detector interval= 10 ! tracker ! fakesink

Each rtspsrc has the same resolution and fps of 25. Because the detector takes too long to finish a batch(around 400ms), I set interval=10 to skip some batches.However, even if I have already set enough output buffers by setting buffer-pool-size, decoding frame will get stuck because the GPU buffers from the decoder have been pushed donwstream and occupied by the detector.
If the input width and height are different from nvstreammux’s, the decoder can run asynchronously and put decoded frames in the queue. Then, when the detector finishes a batch, it can get the next batch in no time instead of waiting for new frames.
Is that possible to add some optional configurations so that I can choose whether to allocate new GPU memories in nvstreammux(even if thers’s no need to scale frames)?

fanzh · June 19, 2023, 1:38am

thanks for the sharing, can you use deeptream-app to reproduce this hang issue? deepstreap-app can support the similar media pipeline.

1002469771 · June 20, 2023, 7:37am

I did some tests with deepstream-app:

/opt/nvidia/deepstream/deepstream/bin/deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/source2_1080p_dec_infer-resnet_demux_int8.txt

There were 32 rtsp sources with fps of 30 in source2_1080p_dec_infer-resnet_demux_int8.txt.Each source had a resolution of 1920*1080.Part of the configs:

[streammux]
batch-size=32
batched-push-timeout=33000
width=1920
height=1080
enable-padding=0
buffer-pool-size=16

[primary-gie]
batch-size=32
interval=15

Osd and sinks were disabled.
To slow down the detector, I made it sleep for 400ms in nvinfer’s code:

static GstFlowReturn
gst_nvinfer_process_full_frame (GstNvInfer * nvinfer, GstBuffer * inbuf,
    NvBufSurface * in_surf)
{
  NvOSD_RectParams rect_params;
  NvDsBatchMeta *batch_meta = NULL;
  guint num_filled = 0;
  std::unique_ptr<GstNvInferBatch> batch = nullptr;
  GstBuffer *conv_gst_buf = nullptr;
  GstFlowReturn flow_ret;
  GstNvInferMemory *memory = nullptr;
  gdouble scale_ratio_x, scale_ratio_y;
  guint offset_left = 0, offset_top = 0;
  gboolean skip_batch;

  /* Process batch only when interval_counter is 0. */
  skip_batch = (nvinfer->interval_counter++ % (nvinfer->interval + 1) > 0);

  if (skip_batch) {
    return GST_FLOW_OK;
  }

  usleep(400 * 1000);   //      sleep for 400ms and do nothing;
  return GST_FLOW_OK;
  ...

When running with nvstreammux configs of width=1920, height=1080(the same with rtsp source), the average fps from PERF print was 24.66(1920x1080_log.txt (14.0 KB)), while when width=1920,height=1088(different from rtsp source), the average fps was 25.15(1920x1088_log.txt (14.0 KB)), which was faster than 1920*1080.
I added some more code in nvinfer:

static GstFlowReturn
gst_nvinfer_submit_input_buffer (GstBaseTransform * btrans,
    gboolean discont, GstBuffer * inbuf)
{
  static auto t1 = std::chrono::high_resolution_clock::now();
  auto t2 = std::chrono::high_resolution_clock::now();
  g_print("gst_nvinfer_submit_input_buffer duration %lu\n",std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count());
  t1 = t2;
  ...

When width=1920, height=1080
1920x1080.log (74.5 KB)
:

...
gst_nvinfer_submit_input_buffer duration 400
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 13
gst_nvinfer_submit_input_buffer duration 1
gst_nvinfer_submit_input_buffer duration 33
gst_nvinfer_submit_input_buffer duration 18
gst_nvinfer_submit_input_buffer duration 19
gst_nvinfer_submit_input_buffer duration 26
gst_nvinfer_submit_input_buffer duration 16
gst_nvinfer_submit_input_buffer duration 22
gst_nvinfer_submit_input_buffer duration 20
gst_nvinfer_submit_input_buffer duration 23
gst_nvinfer_submit_input_buffer duration 18
gst_nvinfer_submit_input_buffer duration 21
...

1920*1088:
1920x1088.log (62.4 KB)

...
gst_nvinfer_submit_input_buffer duration 400
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 2
gst_nvinfer_submit_input_buffer duration 34
gst_nvinfer_submit_input_buffer duration 33
gst_nvinfer_submit_input_buffer duration 34
gst_nvinfer_submit_input_buffer duration 32
...

It seemed that when nvstreammux’s width and height are the same with source’s, it will wait some extra time and slow down the whole pipeline.
Configfile:
source2_1080p_dec_infer-resnet_demux_int8.txt (5.7 KB)

fanzh · June 20, 2023, 7:47am

thanks for the update, we will check.

fanzh · June 21, 2023, 6:42am

if you want to do perf test, please set sync to 0 in [sinkx], it means playing as fast as possible, please refer to DeepStream Reference Application - deepstream-app — DeepStream 6.3 Release documentation

1002469771 · June 21, 2023, 7:08am

I disabled all [sinkx] since I don’t need any output files or displays. Anyway,I tried set sync to 0 and the results were just the same.

fanzh · June 21, 2023, 7:30am

I tested on my T4, here is the result.
if set all type to 1(fakessink), set sync to 1, fps is about 30.
if set all type to 1(fakessink), set sync to 0, fps is about 60.
if diabled all [sinkx], the fps is about 527.15.

1002469771 · June 21, 2023, 7:32am

That’s not possible. Did you test with rtsp sources?I mean, real “Real Time” streams, for example ,streams from cameras.

fanzh · June 21, 2023, 7:55am

sorry, I tested with the local files. please don’t add sleep in nvinfer 's probe function, there is buffer pool in nvstreammux, nvstreammx will wait if downstream returns buffer late.
how to prove the conlusion above?
if you wan’t measure the elment’s delay, please refer to delay

1002469771 · June 21, 2023, 8:16am

I added sleep to simulate the situation that the detector or other downstream elements take a long period of time.

Yes, that’s why I set buffer-pool-size=16 in the nvstreammux.The question is, if the source’s resolution is the same with the nvstreammux’s width and height, it will not work as expected, and I think that’s because the nvstreammux uses the same buffers with the decoder and such buffers are limited.
I’ll try to explain as best as I can with the log files:
When the nvstreammux’s width and height is set as the sources’s resolution(1920 * 1080), the time is like below every 16 frames (interval=15 in the detector):

gst_nvinfer_submit_input_buffer duration 400
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 13
gst_nvinfer_submit_input_buffer duration 1
gst_nvinfer_submit_input_buffer duration 33
gst_nvinfer_submit_input_buffer duration 18
gst_nvinfer_submit_input_buffer duration 19
gst_nvinfer_submit_input_buffer duration 26
gst_nvinfer_submit_input_buffer duration 16
gst_nvinfer_submit_input_buffer duration 22
gst_nvinfer_submit_input_buffer duration 20
gst_nvinfer_submit_input_buffer duration 23
gst_nvinfer_submit_input_buffer duration 18
gst_nvinfer_submit_input_buffer duration 21

Total cost=400 + 230 = 630ms.
While when 1920*1088:

gst_nvinfer_submit_input_buffer duration 400
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 2
gst_nvinfer_submit_input_buffer duration 34
gst_nvinfer_submit_input_buffer duration 33
gst_nvinfer_submit_input_buffer duration 34
gst_nvinfer_submit_input_buffer duration 32

Total cost = 400 + 135 = 535. The difference is 100ms, that means 1920x1080 will take extra 100ms than 1920*1088 every 16 frames.
Why I think the output buffers did not work as expected when 1920x1080 is that, if it worked, the decoder and the detector should run parallelly. And since the rtsp sources’ fps was 30 (33ms per frame), about 11 batches of frames should be decoded and stored in output buffers.That’s why there are about 11 “gst_nvinfer_submit_input_buffer duration 0” when 1920x1088.

fanzh · June 25, 2023, 10:00am

I tested on T4 + DS6.2 with code modification, and I can’t reproduce this issue. here are configuration and logs.
source2_1080p_dec_infer-resnet_demux_int8.txt (4.5 KB)
log.txt (3.6 KB)
source2_1080p_dec_infer-resnet_demux_int8_1088.txt (4.5 KB)
log-1088.txt (2.3 KB)
please help to narrow down this issue. Thanks!

1002469771 · June 25, 2023, 10:02am

Did you use rtsp streams from cameras?

fanzh · June 25, 2023, 10:06am

in source2_1080p_dec_infer-resnet_demux_int8.txt, there are four rtsp sources from gst-rtsp-server software server. this ensures the test sources are the same.

1002469771 · June 25, 2023, 10:09am

So you added just 4 sources? And what these sources like？ Did they produce frames every 30 or 40ms?

fanzh · June 25, 2023, 10:18am

yes, I added 4 rtsp sources, the resolution is 1080p, the fps is 25. here is the detail:
Input #0, rtsp, from ‘rtsp://127.0.0.1:8554/test’:
Metadata:
title : Session streamed with GStreamer
comment : rtsp-server
Duration: 00:05:00.04, start: 0.000000, bitrate: N/A
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp

1002469771 · June 25, 2023, 10:23am

I don’t think 4 sources are enough…How to use gst-rtsp-server software server? I’ll try your case.

fanzh · June 25, 2023, 2:23pm

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

gst-rtsp-server is gstreamer opensource code, you can google the bulding and starting method.

system · July 24, 2023, 6:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
RTSP latency does not work with NVSTREAMMUX DeepStream SDK nvbugs	37	5219	January 23, 2022
Low camera frame rate DeepStream SDK camera , nvbugs	30	6230	October 12, 2021
Can't add new RTSP source dynamically using NEW NVSTREAMMUX and adaptative batching DeepStream SDK	12	743	October 7, 2023
Deepstream 5.1 rtsp and nvinfer issues DeepStream SDK	18	357	July 9, 2024
Nvstreammux (new) plugin is broken in DS 6.2 release DeepStream SDK	22	1430	May 22, 2023
How to compress the size of a streaming image DeepStream SDK	22	891	January 10, 2024
New nvstreammux hangs the Pipeline DeepStream SDK	8	80	September 10, 2024
Pixel distortion in miltiple rtsp input DeepStream SDK deepstream	30	109	November 15, 2024
Reconnection Issue DeepStream SDK	40	1556	January 25, 2024
New NvStreammux shows 「[ERROR push 317] push failed [-5]」 DeepStream SDK	9	599	June 4, 2024

Can nvv4l2decoder and nvstreammux use independent GPU memory?

Related topics