4x 1080p30 encode with gstreamer tee

All. I am looking to encode 4 1080p30 video streams on the x1 using gstreamer.

i have 2 1080p30 sources coming in and am splitting each of those via the gstreamer ‘tee’ plugin.

so
v4l2src_0 -->nvvidconv–>tee–>encode1
_____________________tee–>encode2
and
v4l2src_1 -->nvvidconv–>tee–>encode3
_____________________tee|–>encode4

Each of those 4 streams then gets sent out to mpegtsmux and udp.

Here is an example gst-launch

gst-launch-1.0 v4l2src device=/dev/video0 do-timestamp=true! "video/x-raw, width=1920, height=1080, format=(string)UYVY, framerate=(fraction)30/1" !   queue ! nvvidconv ! 'video/x-raw(memory:NVMM), width=1920, height=1080,format=I420, framerate=30/1' !  tee name=t ! omxh265enc bitrate=500000 control-rate=2 ! h265parse ! mpegtsmux ! udpsink host=224.0.0.3 port=5057 t. ! omxh264enc bitrate=500000 control-rate=2 ! 'video/x-h264, stream-format=(string)byte-stream' ! h264parse ! mpegtsmux ! udpsink host=224.0.0.3 port=5056 v4l2src device=/dev/video1 do-timestamp=true  ! "video/x-raw, width=1920, height=1080, format=(string)UYVY, framerate=(fraction)30/1" !   queue ! nvvidconv ! 'video/x-raw(memory:NVMM), width=1920, height=1080,format=I420, framerate=30/1' !  tee name=t2 ! omxh265enc bitrate=500000 control-rate=2 ! h265parse ! mpegtsmux ! udpsink host=224.0.0.3 port=6057 t2. ! omxh264enc bitrate=500000 control-rate=2 ! 'video/x-h264, stream-format=(string)byte-stream' ! h264parse ! mpegtsmux ! udpsink host=224.0.0.3 port=6056

Unfortunately it seems to run fine for the first 10 seconds at a full 30fps but then slowly drops to 20fps for all streams.

Anyone have any ideas as to what could cause this? I have already tried different queues and muxes.

I do not know if the bottleneck is at the encoder layer or elsewhere. 2x encode (without the tees) using both v4l2src inputs works fine, so my assumption would be the encoder and/or tees are causing issues.

Would nvtee help at all? Does nvtee work with v4l2src?

Is this under L4T 24.1? Some similar issues happened on 23.2 but looks like at least some of them were fixed.

Currently yes this is on 23.2. My driver is soc_camera based on 24-1 has deprecated this so I do not know if I can upgrade to 24-1 without annoyance.

Any idea what may have resolved these issues?

Hi

might be a synchronization problem with multiple gstreamer branches. I would try adding queue elements after every tee and also add sync=false to the sinks.

Regards
Tobias

Hey Kamm. Thanks for the input. Unfortunately I tried adding a ton of queues as well as ‘sync=false’ but the same symptom occurs.

I also tried using a single video source (instead of my 2) and just tee-ed it to create 4 streams and it exhibits the same symptom.

I definitely think you are on to something with it being related to gstreamer since it always works for the first 5-10secs and then drops down to 20fps. So it seems like the encoder can handle the 4x streams but gstreamer is eventually throttling.

I have also witnessed some weird performance issues before with video capturing. Have you tried setting the clocks and cpu governor into performance mode?

I have used these scripts (originally from Nvidia) on R24.1:

I ran them to the same result unfortunatley. Thanks agian for trying though! I greatly appreciate it!

I ran the tegra system profiler previously but I couldn’t really get any indication of an issue.

Here is the output of tegrastats as I execute my gstreamer pipeline and I did notice something strange. For the first few seconds it utilizes cpu core 1 very heavily but then it suddenly switches to using the other cores more and core1 less. This lines up with when the performance drops. Maybe the system is throttling in some way or is incorrectly switching the core utilization thinking it is helping? Not going to lie, this level of debugging is not my forte so if anyone has any insight I’d appreciate it.

//Program not running yet.
RAM 537/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [1%,0%,0%,0%]@1734 EMC 0%@1600 AVP 4%@12 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 537/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [1%,0%,0%,0%]@1734 EMC 0%@1600 AVP 4%@12 VDE 0 GR3D 0%@998 EDP limit 1734

//Gstreamer program started
RAM 539/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [35%,0%,0%,0%]@1734 EMC 0%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 608/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [55%,39%,10%,25%]@1734 EMC 14%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 609/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [56%,25%,6%,35%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [56%,42%,13%,17%]@1734 EMC 24%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [53%,21%,38%,5%]@1734 EMC 25%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [54%,7%,28%,36%]@1734 EMC 25%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [53%,11%,32%,23%]@1734 EMC 25%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [50%,13%,31%,34%]@1734 EMC 25%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [56%,34%,5%,27%]@1734 EMC 25%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734

//Switches here in what cores it uses.  This lines up with when fps drops to 20fps.
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [29%,43%,23%,25%]@1734 EMC 24%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [10%,50%,40%,16%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [13%,37%,32%,36%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [6%,48%,49%,5%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [10%,6%,45%,49%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [12%,35%,37%,30%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [6%,36%,31%,32%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [21%,41%,21%,35%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [4%,30%,33%,47%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [14%,20%,48%,33%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [4%,50%,47%,4%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [14%,39%,38%,25%]@1734 EMC 22%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [6%,34%,1%,35%]@1734 EMC 17%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [9%,48%,7%,16%]@1734 EMC 17%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 610/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [4%,57%,9%,3%]@1734 EMC 16%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [13%,11%,54%,4%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [14%,23%,30%,13%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [5%,0%,33%,33%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [9%,30%,4%,38%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [7%,9%,2%,56%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [6%,37%,3%,33%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [11%,40%,3%,23%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [1%,62%,2%,2%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [6%,64%,5%,4%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [4%,59%,9%,6%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [4%,29%,32%,1%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 611/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [4%,35%,34%,8%]@1734 EMC 15%@1600 AVP 0%@80 VDE 0 GR3D 0%@998 EDP limit 1734

//Program stopped
RAM 537/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [1%,0%,0%,0%]@1734 EMC 2%@1600 AVP 0%@408 VDE 0 GR3D 4%@998 EDP limit 1734
RAM 537/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [1%,0%,0%,0%]@1734 EMC 0%@1600 AVP 4%@12 VDE 0 GR3D 0%@998 EDP limit 1734
RAM 537/3854MB (lfb 690x4MB) SWAP 0/0MB (cached 0MB) cpu [1%,0%,0%,0%]@1734 EMC 0%@1600 AVP 0%@408 VDE 0 GR3D 0%@998 EDP limit 1734

I ran it through the system profiler and you can visibly see when it drops. I marked it here with a giant blue vertical line.

Also it looks like it is the v4l2src threads which are suddenly dropping how much usage they are using. Again I don’t really know why they are doing this (if it is truly them or something upstream).

Direct Link to Photo: http://i.imgur.com/BazA4Ki.png

Looks like a similar symptom when substituting in videotestsrc for my v4l2src plugins.

Here are the images from that:

Direct Link to videotestsrc Photo: http://i.imgur.com/c37kH4M.png

I may have figured it out. looks like adding extra buffers to the nvvidconv element fixes it.

Thats interesting, so what did you change? The output-buffers property of nvvidconv?

Have you also added a queue at the start of each branch? (e.g. after each t ! )

just the output-buffers property and nothing else.

Hi,

We are currently working on the dual capture in the Tegra X1. We are using the Auvidea J100 board and the Toshiba TC358743 HDMI to CSI-2 Bridge. We are able to make dual capture with the following resolutions:

640x480
720x480
720p@30
720p@60

When I try with 1080p@30, it seems to run fine but 1 second later the framerate goes down to 0.5fps. After that the board gets stuck and I have to make a hard reboot. The gstreamer pipeline is:

gst-launch-1.0 v4l2src device=/dev/video0 ! ‘video/x-raw, format=UYVY, width=1920, height=1080’ ! queue ! perf ! fakesink sync=false v4l2src device=/dev/video1 ! ‘video/x-raw, format=UYVY, width=1920, height=1080’ ! queue ! perf ! fakesink sync=false

We are able to capture 1080p@30 and 1080p@60 from one camera.

Did you see this behavior before?

Any help would be appreciated.

Thanks,
Eugenia

Eugenia, I never had such a drastic stop. I am doing dual capture with the following pipeline with no issues:

gst-launch-1.0 v4l2src device=/dev/video1 do-timestamp=true io-mode=rw ! "video/x-raw, width=1920, height=1080, format=(string)UYVY, framerate=(fraction)30/1" !   queue ! nvvidconv output-buffers=25 ! 'video/x-raw(memory:NVMM), width=1920, height=1080,format=I420, framerate=30/1' !  omxh265enc bitrate=500000 control-rate=2 ! h265parse ! avmux_mpegts ! udpsink host=224.0.0.3 port=5057  v4l2src device=/dev/video0 do-timestamp=true io-mode=rw ! "video/x-raw, width=1920, height=1080, format=(string)UYVY, framerate=(fraction)30/1" !   queue ! nvvidconv output-buffers=25! 'video/x-raw(memory:NVMM), width=1920, height=1080,format=I420, framerate=30/1' !  omxh265enc bitrate=500000 control-rate=2 ! h265parse ! avmux_mpegts ! udpsink host=224.0.0.3 port=5056

Though i don’t have the ‘perf’ plugin you are using to measure. I am just viewing back in VLC remotely.

What i was seeing before is when i was using tee, it was throttling down. Most likely you are dealing with a different issue though.

Sorry to revive such an old thread, but unfortunately my original “fix” of adding more output buffers to nvvidconv only fixed this for 4x 1080p30, but going further than than results in the issue reappearing. For example attempting to do 2x 1080p60 + 2x 1080p30 (which is supported by the encoder).

Everything will work fine for the first 10-15 seconds and then it drops the framerate way down. Please see the previous images I posted.

Does anyone have any idea why this would happen? I do not know where to begin at this point.

Hi x1tester62,

Have you tried to fix the CPU/GPU/Mem to max freq to see if any help?
User script attached.

Thanks