Can nvv4l2decoder and nvstreammux use independent GPU memory?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
6.2
• Issue Type( questions, new requirements, bugs)
new requirements
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
According to this page https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html:

If input stream resolution and Gst-nvstreammux resolution (set in the configuration file) are the same, no additional GPU memory is allocated in Gst-nvstreammux. If input stream resolution is not same as Gst-nvstreammux resolution, Gst-nvstreammux allocates memory of size

This can sometimes cause problems.For instance, consider a pipeline like this:

32X rtspsrc ! nvv4l2decoder ! nvstreammux ! queue ! detector interval= 10 ! tracker ! fakesink

Each rtspsrc has the same resolution and fps of 25. Because the detector takes too long to finish a batch(around 400ms), I set interval=10 to skip some batches.However, even if I have already set enough output buffers by setting buffer-pool-size, decoding frame will get stuck because the GPU buffers from the decoder have been pushed donwstream and occupied by the detector.
If the input width and height are different from nvstreammux’s, the decoder can run asynchronously and put decoded frames in the queue. Then, when the detector finishes a batch, it can get the next batch in no time instead of waiting for new frames.
Is that possible to add some optional configurations so that I can choose whether to allocate new GPU memories in nvstreammux(even if thers’s no need to scale frames)?

thanks for the sharing, can you use deeptream-app to reproduce this hang issue? deepstreap-app can support the similar media pipeline.

I did some tests with deepstream-app:

/opt/nvidia/deepstream/deepstream/bin/deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/source2_1080p_dec_infer-resnet_demux_int8.txt

There were 32 rtsp sources with fps of 30 in source2_1080p_dec_infer-resnet_demux_int8.txt.Each source had a resolution of 1920*1080.Part of the configs:

[streammux]
batch-size=32
batched-push-timeout=33000
width=1920
height=1080
enable-padding=0
buffer-pool-size=16

[primary-gie]
batch-size=32
interval=15

Osd and sinks were disabled.
To slow down the detector, I made it sleep for 400ms in nvinfer’s code:

static GstFlowReturn
gst_nvinfer_process_full_frame (GstNvInfer * nvinfer, GstBuffer * inbuf,
    NvBufSurface * in_surf)
{
  NvOSD_RectParams rect_params;
  NvDsBatchMeta *batch_meta = NULL;
  guint num_filled = 0;
  std::unique_ptr<GstNvInferBatch> batch = nullptr;
  GstBuffer *conv_gst_buf = nullptr;
  GstFlowReturn flow_ret;
  GstNvInferMemory *memory = nullptr;
  gdouble scale_ratio_x, scale_ratio_y;
  guint offset_left = 0, offset_top = 0;
  gboolean skip_batch;

  /* Process batch only when interval_counter is 0. */
  skip_batch = (nvinfer->interval_counter++ % (nvinfer->interval + 1) > 0);

  if (skip_batch) {
    return GST_FLOW_OK;
  }

  usleep(400 * 1000);   //      sleep for 400ms and do nothing;
  return GST_FLOW_OK;
  ...

When running with nvstreammux configs of width=1920, height=1080(the same with rtsp source), the average fps from PERF print was 24.66(1920x1080_log.txt (14.0 KB)), while when width=1920,height=1088(different from rtsp source), the average fps was 25.15(1920x1088_log.txt (14.0 KB)), which was faster than 1920*1080.
I added some more code in nvinfer:

static GstFlowReturn
gst_nvinfer_submit_input_buffer (GstBaseTransform * btrans,
    gboolean discont, GstBuffer * inbuf)
{
  static auto t1 = std::chrono::high_resolution_clock::now();
  auto t2 = std::chrono::high_resolution_clock::now();
  g_print("gst_nvinfer_submit_input_buffer duration %lu\n",std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count());
  t1 = t2;
  ...

When width=1920, height=1080
1920x1080.log (74.5 KB)
:

...
gst_nvinfer_submit_input_buffer duration 400
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 13
gst_nvinfer_submit_input_buffer duration 1
gst_nvinfer_submit_input_buffer duration 33
gst_nvinfer_submit_input_buffer duration 18
gst_nvinfer_submit_input_buffer duration 19
gst_nvinfer_submit_input_buffer duration 26
gst_nvinfer_submit_input_buffer duration 16
gst_nvinfer_submit_input_buffer duration 22
gst_nvinfer_submit_input_buffer duration 20
gst_nvinfer_submit_input_buffer duration 23
gst_nvinfer_submit_input_buffer duration 18
gst_nvinfer_submit_input_buffer duration 21
...

1920*1088:
1920x1088.log (62.4 KB)

...
gst_nvinfer_submit_input_buffer duration 400
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 2
gst_nvinfer_submit_input_buffer duration 34
gst_nvinfer_submit_input_buffer duration 33
gst_nvinfer_submit_input_buffer duration 34
gst_nvinfer_submit_input_buffer duration 32
...

It seemed that when nvstreammux’s width and height are the same with source’s, it will wait some extra time and slow down the whole pipeline.
Configfile:
source2_1080p_dec_infer-resnet_demux_int8.txt (5.7 KB)

thanks for the update, we will check.

if you want to do perf test, please set sync to 0 in [sinkx], it means playing as fast as possible, please refer to DeepStream Reference Application - deepstream-app — DeepStream 6.3 Release documentation

I disabled all [sinkx] since I don’t need any output files or displays. Anyway,I tried set sync to 0 and the results were just the same.

I tested on my T4, here is the result.
if set all type to 1(fakessink), set sync to 1, fps is about 30.
if set all type to 1(fakessink), set sync to 0, fps is about 60.
if diabled all [sinkx], the fps is about 527.15.

That’s not possible. Did you test with rtsp sources?I mean, real “Real Time” streams, for example ,streams from cameras.

  1. sorry, I tested with the local files. please don’t add sleep in nvinfer 's probe function, there is buffer pool in nvstreammux, nvstreammx will wait if downstream returns buffer late.
  2. how to prove the conlusion above?
  3. if you wan’t measure the elment’s delay, please refer to delay

I added sleep to simulate the situation that the detector or other downstream elements take a long period of time.

Yes, that’s why I set buffer-pool-size=16 in the nvstreammux.The question is, if the source’s resolution is the same with the nvstreammux’s width and height, it will not work as expected, and I think that’s because the nvstreammux uses the same buffers with the decoder and such buffers are limited.
I’ll try to explain as best as I can with the log files:
When the nvstreammux’s width and height is set as the sources’s resolution(1920 * 1080), the time is like below every 16 frames (interval=15 in the detector):

gst_nvinfer_submit_input_buffer duration 400
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 13
gst_nvinfer_submit_input_buffer duration 1
gst_nvinfer_submit_input_buffer duration 33
gst_nvinfer_submit_input_buffer duration 18
gst_nvinfer_submit_input_buffer duration 19
gst_nvinfer_submit_input_buffer duration 26
gst_nvinfer_submit_input_buffer duration 16
gst_nvinfer_submit_input_buffer duration 22
gst_nvinfer_submit_input_buffer duration 20
gst_nvinfer_submit_input_buffer duration 23
gst_nvinfer_submit_input_buffer duration 18
gst_nvinfer_submit_input_buffer duration 21

Total cost=400 + 230 = 630ms.
While when 1920*1088:

gst_nvinfer_submit_input_buffer duration 400
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 0
gst_nvinfer_submit_input_buffer duration 2
gst_nvinfer_submit_input_buffer duration 34
gst_nvinfer_submit_input_buffer duration 33
gst_nvinfer_submit_input_buffer duration 34
gst_nvinfer_submit_input_buffer duration 32

Total cost = 400 + 135 = 535. The difference is 100ms, that means 1920x1080 will take extra 100ms than 1920*1088 every 16 frames.
Why I think the output buffers did not work as expected when 1920x1080 is that, if it worked, the decoder and the detector should run parallelly. And since the rtsp sources’ fps was 30 (33ms per frame), about 11 batches of frames should be decoded and stored in output buffers.That’s why there are about 11 “gst_nvinfer_submit_input_buffer duration 0” when 1920x1088.

I tested on T4 + DS6.2 with code modification, and I can’t reproduce this issue. here are configuration and logs.
source2_1080p_dec_infer-resnet_demux_int8.txt (4.5 KB)
log.txt (3.6 KB)
source2_1080p_dec_infer-resnet_demux_int8_1088.txt (4.5 KB)
log-1088.txt (2.3 KB)
please help to narrow down this issue. Thanks!

Did you use rtsp streams from cameras?

in source2_1080p_dec_infer-resnet_demux_int8.txt, there are four rtsp sources from gst-rtsp-server software server. this ensures the test sources are the same.

So you added just 4 sources? And what these sources like? Did they produce frames every 30 or 40ms?

yes, I added 4 rtsp sources, the resolution is 1080p, the fps is 25. here is the detail:
Input #0, rtsp, from ‘rtsp://127.0.0.1:8554/test’:
Metadata:
title : Session streamed with GStreamer
comment : rtsp-server
Duration: 00:05:00.04, start: 0.000000, bitrate: N/A
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp

I don’t think 4 sources are enough…How to use gst-rtsp-server software server? I’ll try your case.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

gst-rtsp-server is gstreamer opensource code, you can google the bulding and starting method.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.