hight cpu usage problem with nvcompositor

Hi

I used nvcompositor and nvoverlaysink to composit 4 upd streams(1920x1080pixs 30fps) for 4k display .CPU usage was 230% for 4 cores and fps was down to 5. I checked cpu usage using top command.

gst-launch-1.0 nvcompositor name=comp \
               sink_0::xpos=0 sink_0::ypos=0 sink_0::width=1920 sink_0::height=1080 \
               sink_1::xpos=1920 sink_1::ypos=0 sink_1::width=1920 sink_1::height=1080 \
               sink_2::xpos=0 sink_2::ypos=1080 sink_2::width=1920 sink_2::height=1080 \
               sink_3::xpos=1920 sink_3::ypos=1080 sink_3::width=1920 sink_3::height=1080 ! nvoverlaysink \
               udpsrc multicast-iface="eth1" multicast-group=224.1.1.5 "port=40000" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! omxh264dec enable-low-outbuffer=1 ! comp.sink_0 \
               udpsrc multicast-iface="eth1" multicast-group=224.1.1.5 "port=40010" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! omxh264dec enable-low-outbuffer=1 ! comp.sink_1 \
               udpsrc multicast-iface="eth1" multicast-group=224.1.1.5 "port=40000" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! omxh264dec enable-low-outbuffer=1 ! comp.sink_2 \
               udpsrc multicast-iface="eth1" multicast-group=224.1.1.5 "port=40010" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! omxh264dec enable-low-outbuffer=1 ! comp.sink_3 -e

It was quite higher than 4 x11 displays using nv3dsink whose CPU usage was 160%.

gst-launch-1.0 udpsrc multicast-group=224.1.1.5 "port=40010" ! "application/x-rtp, media=video, encoding-name=H264" ! rtph264depay ! queue ! h264parse ! nvv4l2decoder ! nv3dsink -e

Hi,
Is is expected and optimized since nvcompositor utilizes hardware acceleration(VIC engine) as we have commented at
[url]https://devtalk.nvidia.com/default/topic/1055591/jetson-nano/two-nvoverlaysink-problem-on-jetson-nano/post/5351525/#5351525[/url]

If using nv3dsink is good in your usecase, you may use nv3dsink.

Hi DaneLLL

Thank you for your reply.
Why cpu persentage is so high? It there something wrong with setting?
I need use nvcompositor for setting picture position.
I think I cannot set window position if i use nv3dsink.
Testing nv3dsink is just to compare the cpu persentage with nvcompositor.

Hi,
We can reveal that nvcompositor is implemented based on NvBufferComposite()( defined in nvbuf_utils.h ). However, the source code is not open for public.

We will support to configure w, h, x, y to nv3dsink in next r32 release.

Hi

Thank you for your support.
When is the next r32 release date?

I tested nvcompositor mp4 file which similar to nvcompositor example.The result was same.nvcompositor cause cpu usage high.
①using nvcompositor only one resource. Cpu persentage was 100%

gst-launch-1.0 nvcompositor name=comp \
               sink_3::xpos=1920 sink_3::ypos=1080 sink_3::width=1920 sink_3::height=816 ! nvoverlaysink \
               filesrc location=/home/vsdc/TheBourneUltimatumTrailer.mp4 ! qtdemux name=demux0 \
               ! h264parse ! omxh264dec ! comp.sink_3 -e

②just using nvoverlaysink.Cpu persentage is 15%.

gst-launch-1.0 filesrc location=/home/vsdc/TheBourneUltimatumTrailer.mp4 ! qtdemux name=demux0 \
               !  h264parse ! omxh264dec ! nvoverlaysink overlay-x=0 overlay-y=0 overlay-w=1920 overlay-h=816 overlay=2

cat /etc/nv_tegra_release

R32 (release), REVISION: 1.0, GCID: 14531094, BOARD: t210ref, EABI: aarch64, DATE: Wed Mar 13 07:46:13 UTC 2019

Hi,
We have checked and found the CPU usage is from gstreamer frameworks. nvcompositor plugin is based on GstVideoAggregator:
https://gstreamer.freedesktop.org/documentation/video/gstvideoaggregator.html?gi-language=c

GstVideoAggregator take some CPU usage. If you want to eliminate it, we suggest you try tegra_multimedia_api. You can call NvBufferComposite() to achieve the same function as nvcompositor plugin.

1 Like

Hi DaneLLL

I really appreciate your checking.
I am very interested to use NvBufferComposite().
Is there any ducument or sample code about using tegra_multimedia_api to create gstreamer plugin?

Hi,
We don’t have sample code of creating gstreamer plugin. There is a sample of fetching NvBuffer in appsink so that you can use NvBuffer APIs:
[url]https://devtalk.nvidia.com/default/topic/1037450/jetson-tx2/use-gstreamer-or-tegra_multimedia_api-to-decode-video-would-be-more-efficient-and-increase-throughpu-/post/5270860/#5270860[/url]

Hi

Thank for your advice. I tried flowing, and 8 steams compoisted.And found the problem that sometimes 1 stream’ image is a bit fuzzy.
Using gstreamer to decode 8 h264 streams

geting 8 dmabuf_df from appsink

using NvBufferComposite to 1 compositeFrame

rendering compositeFrame

Is it the data synchronization problem?
I have flowing questions.

  1. How to synchroize between dmabuf_df and compositeFrame to avoid using writing buf?
  2. How many fifos(banks) in dmabuf_df and compositeFrame when I use omxh264dec nvvidconv and NvBufferComposite?
  3. How to increase the buf size(frame fifo) of of dmabuf_df and compositeFrame?

Hi,
For synchronization, you may utilize pts and dts in GstBuffer. Before executing composite, please call gst_buffer_ref() to keep the buffers. After composite is done, call gst_buffer_unref() to return the buffer.

Please also set below property in nvvidconv to test more working buffers.

output-buffers      : number of output buffers
                    flags: readable, writable, changeable in NULL, READY, PAUSED or PLAYING state
                    Unsigned Integer. Range: 1 - 4294967295 Default: 4

Hi

Thank for your reply.

Does GstBuffer work for HW?
How to extract GstBuffer from nvbuffer like ExtractFdFromNvBuffer.

/**
* This method must be used to extract dmabuf_fd of the hardware buffer.
* @param[in] nvbuf Specifies the `hw_buffer`.
* @param[out] dmabuf_fd Returns DMABUF FD of `hw_buffer`.
*
* @returns 0 for success, -1 for failure.
*/
int ExtractFdFromNvBuffer (void *nvbuf, int *dmabuf_fd);

Hi,
You can get it from GstSample:

buffer = gst_sample_get_buffer (sample);
g_print("PTS= %lu\n", buffer->pts);