Nvstreammux (new) plugin is broken in DS 6.2 release

[streammux]
gpu-id=0
live-source=1
batch-size=40
batched-push-timeout=20000
width=1920
height=1080
enable-padding=1
nvbuf-memory-type=0
sync-inputs=1
max-latency=80000000
#mux config
config_file_path=“mux_config.txt”

Here is the content of mux_config.txt:
[property]
algorithm-type=1
batch-size=1
overall-max-fps-n=30
overall-max-fps-d=1
overall-min-fps-n=10
overall-min-fps-d=1
adaptive-batching=1

Note that live-source, batch-size, batched-push-timeout, width and height in [streammux] are N/A (irrelevant) since we are using adaptive-batching mode.

How do you add the nvsurface->batchSize (adaptive batchSize) monitoring?
Could you reproduce your peoblem with our demo code: https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/runtime_source_add_delete ?
If we can reproduce your problem in our environment, it can help us analyze it faster.

I’ll adapt runtime_source_and_delete code to demonstrate the issue. In the meanwhile, can you forward the issue to your developers? This is basically a show-stop issue. It is quite likely that it is just a simple mistake in sink_pad_delete in nvstreammux source code, which meant to decrease the batchSize, instead, accidentally increment the batchSize. As I mentioned, DS6.0.1 does not have this issue. Our application basically has no changes from DS6.0.1 to DS6.2.

I am able to reproduce the issue after modify the source code “runtime_source_add_delete”, you can follow the steps:

  1. create 2 virtual rtsp sources camera with rtsp server with “rtsp://127.0.0.1:19000/stream” and “rtsp://127.0.0.1:19001/stream”

  2. export USE_NEW_NVSTREAMMUX=yes

  3. make & make install (which will install
    to directory: /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream_reference_apps/runtime_source_add_delete)

  4. run ./deepstream-test-rt-src-add-del rtsp://127.0.0.1:19000/stream 1 nveglglessink 0 which will add the two sources and display on nvelgglesink

  5. While two sources are playing, you hit key “d” and then “return” key, which will delete source_id 1.

  6. You will see, the remaining source_id (0) playing will become so choppy with very low fps on nvelgglesink.

  7. You can enable tiler_sink_pad_buffer_probe (by changing #if 0 to #if 1 at line 636 ) in the source code to observe nvbufSurface->batchSize keeps at 2 (after deletion) although I did not observe the batchSize increasing in this case. Here is the adapted source code.
    runtime_source_add_delete.zip (38.6 KB)

  8. I hardcoded 3 rtsp source cameras in the source code
    deepstream_test_rt_src_add_del.c (22.9 KB)
    in case you want to test 3-source. You will notice the same choppy with very low fps on nvelgglesink after deleting source_id 2 and source_id 1 while batchSize keeps printing the same value 3.

  9. In case you don’t have rtsp cameras available for testing, you can try to create rtsp server with the following example:
    a: docker run --rm -it --network=host aler9/rtsp-simple-server
    b: cd /opt/nvidia/deepstream/deepstream/samples/streams
    c: ffmpeg -re -stream_loop -1 -i sample_1080p_h264.mp4 -c:v copy -an -f rtsp -rtsp_transport tcp rtsp://127.0.0.1:8554/stream0
    d: ffmpeg -re -stream_loop -1 -i sample_1080p_h264.mp4 -c:v copy -an -f rtsp -rtsp_transport tcp rtsp://127.0.0.1:8554/stream1
    e: ffmpeg -re -stream_loop -1 -i sample_1080p_h264.mp4 -c:v copy -an -f rtsp -rtsp_transport tcp rtsp://127.0.0.1:8554/stream2
    Make sure that you change rtsp sources to “rtsp://127.0.0.1:8554/stream0” “rtsp://127.0.0.1:8554/stream1” and “rtsp://127.0.0.1:8554/stream2” in deepstream_test_rt_src_add_del.c. And run command run ./deepstream-test-rt-src-add-del rtsp://127.0.0.1:8554/stream0 1 nveglglessink 0

Thanks for your primary analysis. I use 2 real rtsp sources with your code attached. When deleting one source, the fps of the other does indeed decrease significantly. We will reply promptly once we have a conclusion.

Thanks and hope you can resolve the issue as soon as possible.

It looks like the parameters you set in the mux_config.txt have some problems. I changed it to:

[property]
algorithm-type=1
#batch-size=1
overall-max-fps-n=60
overall-max-fps-d=1
overall-min-fps-n=20
overall-min-fps-d=1
#max-same-source-frames=1
adaptive-batching=1

The print output is as follows:

**PERF: 8.01 (8.17)     8.58 (6.37)
**PERF: 7.00 (7.95)     7.50 (6.67)
**PERF: 7.01 (7.80)     7.51 (6.88)
**PERF: 9.53 (7.69)     7.49 (7.03)
**PERF: 7.51 (7.73)     7.01 (7.02)
**PERF: 7.49 (7.76)     6.99 (7.02)
**PERF: 7.51 (7.78)     7.01 (7.02)
**PERF: 7.49 (7.80)     9.54 (7.02)
**PERF: 7.00 (7.73)     7.50 (7.09)
**PERF: 7.01 (7.68)     7.51 (7.16)
**PERF: 7.01 (7.63)     7.51 (7.22)
d
: d
To delete source_id = 1
STATE CHANGE SUCCESS

STATE CHANGE SUCCESS 0x7f987c005d10

**PERF: 7.49 (7.64)     7.49 (7.25)
**PERF: 14.03 (8.03)    3.00 (6.87)
**PERF: 14.99 (8.42)    0.00 (6.48)
**PERF: 15.02 (8.78)    0.00 (6.13)
**PERF: 15.02 (9.10)    0.00 (5.81)
**PERF: 14.99 (9.39)    0.00 (5.53)
**PERF: 15.01 (9.65)    0.00 (5.27)
**PERF: 15.00 (9.89)    0.00 (5.04)
**PERF: 15.03 (10.11)   0.00 (4.82)
**PERF: 14.99 (10.31)   0.00 (4.62)
**PERF: 15.03 (10.49)   0.00 (4.44)
**PERF: 14.99 (10.66)   0.00 (4.27)

I don’t think so. Normally I set overall-max-fps-n=30, I got the same problem.
Even in you case, it does not make sense that when overall-max-fps-n=60, you only get throughput fps about 7 and after you delete one source, it only gets about fps 15.

Also, does not make sense to set overall-max-fps-n = 60 since most of source rtsp cameras fps is 25/30.

We have been tried many different settings based on deepstream doc with DS6.2 and very sure that something is wrong in nvstreammux.

Wondering if there is chance that you can make gst plugin nvstreammux to be open source. We can certainly know how fix the issue.

Hi,

In my case I had very strange similar problem, that I finally “solved” giving overall-max-fps-n an arbitrary value like overall-min-fps-n * number_of_sources * 10… Very arbitrary, but this is because the doc doesn’t help, I have no idea of what a good value is for this parameter…

I agree that it would be nice to have nvstreammux open source…

That is true. We had tried overall-max-fps-n = number_of_sources * some_value and overall-min-fps-n = number_of_sources * some_other_value too. But it is not always working since number_of_sources may change dynamically and wildly.

From we have experienced in deepstream sdk based application, we found that nvstreammux/nvstreamdemux determines the fundamentals of gst pipeline stability. We do hope that NVIDIA can make it to be open source.

Note that we had to develop dsstreamdemux to replace nvstreamdemux due to its unexpected video scaling in the case of mixed video resolutions. We may have to develop nvstreammux if you can’t make it open source. But that’s a waste of time and effort.

OK. We’ll check the following 2 issues. If there are any results, we will reply in time.
1.How to set parameters to nvstreammux in the scenarios of dynamically adding and deleting sources.
2.Could the nvstreammux code be open source.

Why donn’t you send eos message when stop the source in your demo code? You should send an eos message when you dynamically delete a source. I try to send the eos message, it works fine.(The fps of my rtsp stream is 15.)

overall-max-fps-n=120
overall-max-fps-d=1
overall-min-fps-n=5
overall-min-fps-d=1
 gst_pad_send_event (sinkpad, gst_event_new_eos ());

The fps of my rtsp source is 15. The log is as below:

2 source:
**PERF: 14.98 (10.90)   14.98 (11.12)
**PERF: 15.03 (11.53)   15.03 (11.74)
**PERF: 14.99 (12.00)   14.99 (12.19)
**PERF: 15.03 (12.36)   15.03 (12.53)
**PERF: 14.98 (12.64)   14.98 (12.80)
**PERF: 15.03 (12.86)   15.03 (13.01)
**PERF: 14.99 (13.05)   14.99 (13.19)
**PERF: 15.03 (13.20)   15.03 (13.33)
**PERF: 14.98 (13.34)   14.98 (13.46)
**PERF: 15.03 (13.45)   15.03 (13.57)
**PERF: 14.98 (13.55)   14.98 (13.66)
**PERF: 15.03 (13.64)   15.03 (13.74)
**PERF: 14.99 (13.72)   14.99 (13.82)
**PERF: 15.03 (13.79)   15.03 (13.88)
**PERF: 15.01 (13.85)  15.01 (13.94)
1 source:
**PERF: 15.02 (13.96)   0.00 (12.77)
**PERF: 14.98 (14.00)   0.00 (12.19)
**PERF: 15.03 (14.04)   0.00 (11.67)
**PERF: 14.99 (14.08)   0.00 (11.18)
**PERF: 15.03 (14.12)   0.00 (10.74)
**PERF: 14.99 (14.15)   0.00 (10.33)
**PERF: 15.03 (14.18)   0.00 (9.95)
**PERF: 14.99 (14.21)   0.00 (9.60)
**PERF: 15.03 (14.24)   0.00 (9.27)
**PERF: 14.98 (14.26)   0.00 (8.96)
**PERF: 15.03 (14.28)   0.00 (8.68)
**PERF: 14.98 (14.31)   0.00 (8.41)
**PERF: 15.03 (14.33)   0.00 (8.15)
**PERF: 14.98 (14.35)   0.00 (7.92)

I updated the code to add back “gst_pad_send_event (sinkpad, gst_event_new_eos ());” and surprisingly the call “gst_pad_unlink” has to be removed before calling gst_element_release_request_pad. Note that “gst_pad_unlink” is being called in deepstream_common.c in DS6.2. This did improve and most of times, the fps maintains after deleting a source. But sometimes, the fps still hits near zero after deleting a source depending on timing (This can be more likely reproduced if you hit “d” quickly as soon as you are seeing all videos are displayed).
I have updated the deepstream_test_rt_src_add_del.c and command line options to make it easier to reproduce the issue:

  1. have fixed the list of 3 rtsp sources uri in the code char * SRC_URI_LIST = {
    “rtsp://127.0.0.1:19000/stream”,
    “rtsp://127.0.0.1:19001/stream”,
    “rtsp://127.0.0.1:19002/stream”,
    “”
    };
  2. command line is “./deepstream-test-rt-src-add-del 1 nveglglessink 1” (I removed uri option from command line options)
  3. Now you can add source back by hitting ‘a’ and return. (note that ‘d’ and return for deleting ) Here is updated code:
    runtime_source_add_delete.zip (38.7 KB)

The other issue is that with the exact same 3 video sources, the fps throughput can easily be 30 in DS6.0 (even with more than 20 sources on RTX 30380). But it can only reach 24 in DS6.2 with just 3 video sources.

I use your code and ffmpeg rtsp server to test the fps. Add perf log as follows: Enable Perf measurement. Because there is no monitor on my device, I chose filesink.
The printing frequency is 1 second. After adding or deleting source, the nvstreammux batch algorithm requires some time to converge. But after stabilization, the frame rate is around 30.
Log:

Begin:
**PERF: 6.32 (5.09)     0.00 (0.00)     0.00 (0.00)
**PERF: 6.32 (6.29)     6.68 (6.63)     32.40 (29.24)
**PERF: 4.03 (5.40)     4.03 (5.25)     4.03 (5.61)
**PERF: 2.98 (4.73)     3.98 (4.82)     3.98 (4.83)
**PERF: 29.10 (10.02)   28.10 (10.75)   29.10 (12.70)
**PERF: 30.05 (13.59)   30.05 (14.67)   30.05 (16.95)
**PERF: 30.02 (16.08)   30.02 (17.26)   30.02 (19.52)
**PERF: 30.05 (17.91)   30.05 (19.10)   30.05 (21.24)
**PERF: 30.03 (19.32)   30.03 (20.48)   30.03 (22.48)
**PERF: 29.95 (20.43)   29.95 (21.55)   29.95 (23.41)
**PERF: 30.15 (21.33)   30.15 (22.40)   30.15 (24.13)
**PERF: 30.05 (22.07)   30.05 (23.09)   29.05 (24.61)
**PERF: 30.04 (22.78)   30.04 (23.75)   29.07 (25.10)
**PERF: 30.02 (23.31)   30.02 (24.24)   29.02 (25.42)
**PERF: 30.04 (23.77)   30.04 (24.65)   30.04 (25.77)

Delete 1 source:
**PERF: 30.06 (25.97)   30.06 (26.59)   30.05 (26.09)
**PERF: 29.04 (26.09)   30.04 (26.74)   0.00 (24.90)
**PERF: 29.03 (26.21)   30.03 (26.88)   0.00 (23.82)
**PERF: 30.04 (26.28)   30.04 (26.92)   0.00 (22.83)
**PERF: 29.11 (26.46)   30.05 (27.12)   0.00 (21.92)
**PERF: 29.04 (26.55)   30.04 (27.22)   0.00 (21.08)
**PERF: 29.05 (26.67)   30.02 (27.36)   0.00 (20.30)
**PERF: 29.05 (26.75)   30.05 (27.45)   0.00 (19.58)
**PERF: 25.64 (26.62)   26.75 (27.33)   0.00 (18.91)
**PERF: 25.40 (26.67)   32.24 (27.39)   0.00 (18.28)
**PERF: 33.14 (26.86)   29.20 (27.62)   0.00 (17.69)
**PERF: 27.22 (26.80)   28.80 (27.48)   0.00 (17.14)
**PERF: 33.52 (26.75)   22.68 (27.47)   0.00 (16.62)
**PERF: 27.79 (27.04)   37.73 (27.77)   0.00 (16.13)
**PERF: 28.71 (26.90)   27.22 (27.67)   0.00 (15.67)
**PERF: 27.61 (27.09)   32.27 (27.89)   0.00 (15.24)
**PERF: 27.39 (26.93)   26.91 (27.79)   0.00 (14.83)
**PERF: 27.63 (27.11)   32.57 (28.00)   0.00 (14.44)
**PERF: 25.95 (27.01)   30.01 (27.87)   0.00 (14.07)
**PERF: 32.40 (27.22)   28.43 (28.05)   0.00 (13.72)
**PERF: 29.04 (27.27)   30.04 (28.09)   0.00 (13.38)
**PERF: 30.01 (27.17)   26.24 (27.97)   0.00 (13.06)

add 1 source:
**PERF: 2.37 (25.51)    3.81 (26.32)    0.00 (11.43)
**PERF: 13.33 (25.29)   10.47 (26.01)   0.35 (11.40)
**PERF: 0.00 (24.79)    0.00 (25.50)    0.00 (11.18)
**PERF: 0.50 (24.34)    0.50 (25.03)    0.50 (10.98)
**PERF: 29.12 (24.45)   29.12 (25.12)   30.10 (11.36)
**PERF: 30.03 (24.55)   30.03 (25.21)   30.03 (11.71)
**PERF: 30.05 (24.65)   30.05 (25.30)   30.05 (12.05)
**PERF: 30.05 (24.74)   30.05 (25.38)   30.05 (12.38)
**PERF: 30.04 (24.83)   30.04 (25.46)   30.04 (12.69)
**PERF: 30.03 (24.92)   30.03 (25.54)   30.03 (12.99)
**PERF: 30.05 (25.00)   30.05 (25.62)   30.05 (13.29)
**PERF: 30.04 (25.09)   30.04 (25.69)   30.04 (13.57)
**PERF: 27.56 (25.13)   28.54 (25.74)   28.54 (13.82)
**PERF: 31.58 (25.23)   31.58 (25.83)   31.58 (14.11)
**PERF: 29.07 (25.29)   30.07 (25.90)   29.07 (14.35)
**PERF: 28.02 (25.33)   30.02 (25.96)   29.02 (14.58)

Let’s not to concern throughput for now. The major problem is still the near-zero fps issue if I repeat with “d” then “a” serveral times quickly. I got the following message:
WARNING from element nveglglessink: A lot of buffers are being dropped.
Warning: A lot of buffers are being dropped.
WARNING from element nveglglessink: A lot of buffers are being dropped.
Warning: A lot of buffers are being dropped.
WARNING from element nveglglessink: A lot of buffers are being dropped.
Warning: A lot of buffers are being dropped.
WARNING from element nveglglessink: A lot of buffers are being dropped.
Warning: A lot of buffers are being dropped.
WARNING from element nveglglessink: A lot of buffers are being dropped.
Warning: A lot of buffers are being dropped.
WARNING from element nveglglessink: A lot of buffers are being dropped.

Something is still very wrong.

As I mentioned before:

You can see the log in my env. At first, the data was not very stable, but it will eventually stabilize.

d:
a:
**PERF: 30.05 (27.03)   26.96 (26.23)   0.00 (22.37)
**PERF: 4.26 (26.00)    1.91 (25.14)    0.00 (21.37)
**PERF: 4.77 (24.94)    2.86 (24.19)    0.00 (20.45)
**PERF: 1.59 (23.96)    3.56 (23.31)    0.00 (19.61)
**PERF: 3.18 (23.15)    4.73 (22.47)    0.00 (18.83)
**PERF: 3.15 (22.32)    2.98 (21.81)    0.00 (18.11)
**PERF: 2.04 (21.59)    3.80 (21.15)    0.00 (17.45)
**PERF: 4.77 (20.99)    3.81 (20.55)    0.00 (16.83)
**PERF: 3.57 (20.36)    4.76 (19.95)    0.00 (16.26)
**PERF: 19.61 (20.08)   14.63 (19.68)   0.88 (16.05)
**PERF: 0.00 (19.42)    0.00 (19.05)    0.00 (15.54)
**PERF: 5.89 (19.28)    5.89 (18.93)    6.29 (15.55)
**PERF: 30.05 (19.61)   30.05 (19.26)   30.05 (15.98)
**PERF: 30.04 (19.92)   30.04 (19.57)   30.04 (16.39)
**PERF: 30.03 (20.21)   30.03 (19.87)   30.03 (16.78)

I have provided my testing method. How do you get the fps in your code and will it be stable after a period of time?

I did not add demux in deepstream-test-rt-src-add-del for fps output, I will try to add it tomorrow.
Right now I just watch tiler displaying, it doesn’t seem to be recovering to normal fps sometimes. But I will test more.

I retried many times with nveglglessink. I confirmed that normal fps throughput can never recover to normal fps if you try “d”/“a” several times and video display will show near freezing on nveglglessink. I haven’t have chance to add demux yet. Note that I used 3 rtspsrcs with different video resolutions (1920x1080, 3840x2160 and 2592x1944)
Have you ask if it is possible to make the nvstreammux to be open source?
One other major issue (in both DS6.0 and DS6.2) is that if the all fps will be set to the lowest fps of rtspsrcs. For example, if one camera is 30 fps, second camera is 15 fps and third camera is 5 fps, then the throughput fps is only 5 fps in adaptive batching mode. This is very bad since no reliable detection is going to be obtained for any camera because fps is too low. Although you can use gst videorate plugin to make the all input fps to be 30 after decoder, but that simply wastes GPU computation powers. So at this point, I really hope that nvstreammux will be open source so that we have chance to customize it.

If we open source the code, it will be stated in our release notes. So please pay attention to our updates.
About the “d”/“a” operation, I use your code attached to operate serveral times, it can stabilize to 30 eventually.
Let’s put aside some hypothetical scenarios. Could you provide a specific description of your product and the problems encountered in actual use?

We have finally developed our customized gst plugin mux/demux sucessfully to address the issues. So we no longer need to use nvstreammux/nvstreamdemux anymore. The issues in nvstreammux/nvstreamdemu remain there.

Glad to hear that. If your code can be open source, you can also send a link as a reference. Thanks