Split video into 4 smaller ones

Hi there!

I got 3x 4k @30fps videos (3840x2160) which shall be split up into 12x full-hd-videos (1920x1080) in real-time. so each 4k video consists of 4 full-hd videos. The split-up full-hd videos shall then be encoded to h264.
Here’s my gstreamer pipeline so far

gst-launch-1.0 -v \
  filesrc location=video0/video ! matroskademux name="demux1" demux1. ! nvv4l2decoder ! tee name=t1 \
    t1. ! queue ! nvvidconv left=0 right=1920 top=0 bottom=1080 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t1. ! queue ! nvvidconv left=1920 right=3840 top=0 bottom=1080 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t1. ! queue ! nvvidconv left=0 right=1920 top=1080 bottom=2160 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t1. ! queue ! nvvidconv left=1920 right=3840 top=1080 bottom=2160 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
  filesrc location=video1/video ! matroskademux name="demux2" demux2. ! nvv4l2decoder ! tee name=t2 \
    t2. ! queue ! nvvidconv left=0 right=1920 top=0 bottom=1080 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t2. ! queue ! nvvidconv left=1920 right=3840 top=0 bottom=1080 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t2. ! queue ! nvvidconv left=0 right=1920 top=1080 bottom=2160 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t2. ! queue ! nvvidconv left=1920 right=3840 top=1080 bottom=2160 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
  filesrc location=video2/video ! matroskademux name="demux3" demux3. ! nvv4l2decoder ! tee name=t3 \
    t3. ! queue ! nvvidconv left=0 right=1920 top=0 bottom=1080 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t3. ! queue ! nvvidconv left=1920 right=3840 top=0 bottom=1080 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t3. ! queue ! nvvidconv left=0 right=1920 top=1080 bottom=2160 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
    t3. ! queue ! nvvidconv left=1920 right=3840 top=1080 bottom=2160 ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \

This works great! except that the performance is behind my expectations. I was hoping for at least 30fps, but I’m only able to do around 23fps.
When I remove the nvv4l2h264enc the performance rises to ~40fps.
However I suspect nvvidconv to be at fault, because the board easily can encode 12 full-hd streams.
I guess the cropping/splitting makes an expensive copy.

Despite my try with gst-launch I tried the same in C++ and with NvBufferTransform. With around the same performance results.

Have you got any tips or directions how I might be able to achieve my goal?

Cheers,
Markus

Hi,
For this use-case we would suggest run hardware converter at max clock to get optimal performance. Please refer to steps in this post and give it a try:
Nvvideoconvert issue, nvvideoconvert in DS4 is better than Ds5? - #3 by DaneLLL

Thank you very much for your time and suggestion!

Unfortunately this did not change anything :-( This is what I executed as root:

echo on > /sys/devices/13e10000.host1x/15340000.vic/power/control
echo userspace > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/governor

# cat /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/available_frequencies
echo 1036800000 > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/max_freq
echo 1036800000 > /sys/devices/13e10000.host1x/15340000.vic/devfreq/15340000.vic/userspace/set_freq

Have you got any other suggestions?

Neither my C++ code nor the pipeline from the first post got any faster.
I did some more profiling and to me it seems to be the NvBufferTransformEx at fault.

How is the cropping in the nvvidconv implemented? Where and how does it get executed?

Any performance improvment ideas for my C++ code?

Here I’m getting the top left quarter while copying the buffer from one process to another. The performance does not improve when I use NvBufferTransform in the same process.

      NvBufferTransformParams transform_params{};
      memset(&transform_params, 0, sizeof(NvBufferTransformParams));
      transform_params.transform_flag = NVBUFFER_TRANSFORM_CROP_SRC;
      transform_params.src_rect = {0,
                                   0,
                                   source_params.params.width[0] / 2,
                                   source_params.params.height[0] / 2};
      transform_params.dst_rect = {0, 0, target_params.width[0],
                                   target_params.height[0]};
      int result = NvBufferTransformEx(
          source_dmabuf_fd, &source_params.params_ex, target_dmabuf_fd,
          &target_params_ex, &transform_params);

Btw my system is a Xavier with R32 Revision 6.1.

Cheers,
Markus

Hi,
A forum user has shared his suggestion:

Seems the issue is with missing h264parse and incorrect cropping.
The following works fine @60fps with sync:

gst-launch-1.0 -v
filesrc location= ./Videos/test_h264_main_3840x2160p60.mkv ! matroskademux name=demux1 demux1. ! video/x-h264 ! queue ! h264parse ! nvv4l2decoder ! tee name=t1
t1. ! queue ! nvvidconv left=0 right=1919 top=0 bottom=1079 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t1. ! queue ! nvvidconv left=1920 right=3839 top=0 bottom=1079 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t1. ! queue ! nvvidconv left=0 right=1919 top=1080 bottom=2159 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t1. ! queue ! nvvidconv left=1920 right=3839 top=1080 bottom=2159 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
filesrc location= ./Videos/test_h264_main_3840x2160p60.mkv ! matroskademux name=demux2 demux2. ! video/x-h264 ! queue ! h264parse ! nvv4l2decoder ! tee name=t2
t2. ! queue ! nvvidconv left=0 right=1919 top=0 bottom=1079 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t2. ! queue ! nvvidconv left=1920 right=3839 top=0 bottom=1079 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t2. ! queue ! nvvidconv left=0 right=1919 top=1080 bottom=2159 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t2. ! queue ! nvvidconv left=1920 right=3839 top=1080 bottom=2159 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
filesrc location= ./Videos/test_h264_main_3840x2160p60.mkv ! matroskademux name=demux3 demux3. ! video/x-h264 ! queue ! h264parse ! nvv4l2decoder ! tee name=t3
t3. ! queue ! nvvidconv left=0 right=1919 top=0 bottom=1079 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t3. ! queue ! nvvidconv left=1920 right=3839 top=0 bottom=1079 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t3. ! queue ! nvvidconv left=0 right=1919 top=1080 bottom=2159 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink
t3. ! queue ! nvvidconv left=1920 right=3839 top=1080 bottom=2159 ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! queue ! nvv4l2h264enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink

Please give it a try.

Hi,

thx for the great response!

the caps 'video/x-raw(memory:NVMM),width=1920,height=1080' after the nvvidconv fixed the issue for me! awesome, thx! Now I’m able to do 90fps which just rocks!

However I’m not able to reproduce the same performance in C++ using NvBufferTransformEx or NvBufferTransform.
I ensured that the pixelFormat is the same for the source and the target.

I reduced my problem to just doing 12x NvBufferTransforms. However doing so, I run into severe performance issues. I double checked that the target buffer is size 1920x1080. Even when I do 12x 1:1 copies (1920x1080 to 1920x1080) I get bad performance. Even when I split this up to several processes. It seems to me that NvBufferTransform is using some kind of global lock or just has bad performance?

Any ideas on how to improve my code?

Cheers,
Markus

Hi,
Please create NvBufferSession and set to transform_params. Here is a patch for reference:
A segfault occurs when creating NvVideoDecoder inside a child process - #9 by DaneLLL

And if you don’t need to access the NvBuffer, pease keep the format in block linear instead of converting to pitch linear.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.