hight cpu usage problem with nvcompositor

Hi

I used nvcompositor and nvoverlaysink to composit 4 upd streams(1920x1080pixs 30fps) for 4k display .CPU usage was 230% for 4 cores and fps was down to 5. I checked cpu usage using top command.

gst-launch-1.0 nvcompositor name=comp \
               sink_0::xpos=0 sink_0::ypos=0 sink_0::width=1920 sink_0::height=1080 \
               sink_1::xpos=1920 sink_1::ypos=0 sink_1::width=1920 sink_1::height=1080 \
               sink_2::xpos=0 sink_2::ypos=1080 sink_2::width=1920 sink_2::height=1080 \
               sink_3::xpos=1920 sink_3::ypos=1080 sink_3::width=1920 sink_3::height=1080 ! nvoverlaysink \
               udpsrc multicast-iface="eth1" multicast-group=224.1.1.5 "port=40000" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! omxh264dec enable-low-outbuffer=1 ! comp.sink_0 \
               udpsrc multicast-iface="eth1" multicast-group=224.1.1.5 "port=40010" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! omxh264dec enable-low-outbuffer=1 ! comp.sink_1 \
               udpsrc multicast-iface="eth1" multicast-group=224.1.1.5 "port=40000" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! omxh264dec enable-low-outbuffer=1 ! comp.sink_2 \
               udpsrc multicast-iface="eth1" multicast-group=224.1.1.5 "port=40010" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! omxh264dec enable-low-outbuffer=1 ! comp.sink_3 -e

It was quite higher than 4 x11 displays using nv3dsink whose CPU usage was 160%.

gst-launch-1.0 udpsrc multicast-group=224.1.1.5 "port=40010" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! nvv4l2decoder ! nv3dsink -e

Hi,
Is is expected and optimized since nvcompositor utilizes hardware acceleration(VIC engine) as we have commented at
https://devtalk.nvidia.com/default/topic/1055591/jetson-nano/two-nvoverlaysink-problem-on-jetson-nano/post/5351525/#5351525

If using nv3dsink is good in your usecase, you may use nv3dsink.

Hi DaneLLL

Thank you for your reply.
Why cpu persentage is so high? It there something wrong with setting?
I need use nvcompositor for setting picture position.
I think I cannot set window position if i use nv3dsink.
Testing nv3dsink is just to compare the cpu persentage with nvcompositor.

Hi,
We can reveal that nvcompositor is implemented based on NvBufferComposite()( defined in nvbuf_utils.h ). However, the source code is not open for public.

We will support to configure w, h, x, y to nv3dsink in next r32 release.

Hi

Thank you for your support.
When is the next r32 release date?

I tested nvcompositor mp4 file which similar to nvcompositor example.The result was same.nvcompositor cause cpu usage high.
①using nvcompositor only one resource. Cpu persentage was 100%

gst-launch-1.0 nvcompositor name=comp \
               sink_3::xpos=1920 sink_3::ypos=1080 sink_3::width=1920 sink_3::height=816 ! nvoverlaysink \
               filesrc location=/home/vsdc/TheBourneUltimatumTrailer.mp4 ! qtdemux name=demux0 \
               ! h264parse ! omxh264dec ! comp.sink_3 -e

②just using nvoverlaysink.Cpu persentage is 15%.

gst-launch-1.0 filesrc location=/home/vsdc/TheBourneUltimatumTrailer.mp4 ! qtdemux name=demux0 \
               !  h264parse ! omxh264dec ! nvoverlaysink overlay-x=0 overlay-y=0 overlay-w=1920 overlay-h=816 overlay=2

cat /etc/nv_tegra_release

R32 (release), REVISION: 1.0, GCID: 14531094, BOARD: t210ref, EABI: aarch64, DATE: Wed Mar 13 07:46:13 UTC 2019

Hi,
We have checked and found the CPU usage is from gstreamer frameworks. nvcompositor plugin is based on GstVideoAggregator:
https://gstreamer.freedesktop.org/documentation/video/gstvideoaggregator.html?gi-language=c

GstVideoAggregator take some CPU usage. If you want to eliminate it, we suggest you try tegra_multimedia_api. You can call NvBufferComposite() to achieve the same function as nvcompositor plugin.

Hi DaneLLL

I really appreciate your checking.
I am very interested to use NvBufferComposite().
Is there any ducument or sample code about using tegra_multimedia_api to create gstreamer plugin?

Hi,
We don’t have sample code of creating gstreamer plugin. There is a sample of fetching NvBuffer in appsink so that you can use NvBuffer APIs:
https://devtalk.nvidia.com/default/topic/1037450/jetson-tx2/use-gstreamer-or-tegra_multimedia_api-to-decode-video-would-be-more-efficient-and-increase-throughpu-/post/5270860/#5270860

Hi

Thank for your advice. I tried flowing, and 8 steams compoisted.And found the problem that sometimes 1 stream’ image is a bit fuzzy.
Using gstreamer to decode 8 h264 streams

geting 8 dmabuf_df from appsink

using NvBufferComposite to 1 compositeFrame

rendering compositeFrame

Is it the data synchronization problem?
I have flowing questions.

  1. How to synchroize between dmabuf_df and compositeFrame to avoid using writing buf?
  2. How many fifos(banks) in dmabuf_df and compositeFrame when I use omxh264dec nvvidconv and NvBufferComposite?
  3. How to increase the buf size(frame fifo) of of dmabuf_df and compositeFrame?

Hi,
For synchronization, you may utilize pts and dts in GstBuffer. Before executing composite, please call gst_buffer_ref() to keep the buffers. After composite is done, call gst_buffer_unref() to return the buffer.

Please also set below property in nvvidconv to test more working buffers.

output-buffers      : number of output buffers
                    flags: readable, writable, changeable in NULL, READY, PAUSED or PLAYING state
                    Unsigned Integer. Range: 1 - 4294967295 Default: 4

Hi

Thank for your reply.

Does GstBuffer work for HW?
How to extract GstBuffer from nvbuffer like ExtractFdFromNvBuffer.

/**
* This method must be used to extract dmabuf_fd of the hardware buffer.
* @param[in] nvbuf Specifies the `hw_buffer`.
* @param[out] dmabuf_fd Returns DMABUF FD of `hw_buffer`.
*
* @returns 0 for success, -1 for failure.
*/
int ExtractFdFromNvBuffer (void *nvbuf, int *dmabuf_fd);

Hi,
You can get it from GstSample:

buffer = gst_sample_get_buffer (sample);
g_print("PTS= %lu\n", buffer->pts);