Xavier gstreamer decode performance issue

Hi all expert:

I got a serious cpu bound issue on video steaming, the following is the analysis and test environment, please let me know how to resolve it. our goal is to have 50 video steaming with some video analysis on it. 

but now, 
1. only 30 video steaming decode encounter cpu bound issue.
2. start from 3rd video steaming, there is an warming show the system is too slow.

Question
1. H264 decode on NV command side, does it all the HW decode or apart of them is HW, Does Xavier have a dedicated hw decoder on it? If YES, does I use it correctly with the following?
2. how to improve it on Xavier platform?

environment:

  • Xavier platform with JesPack-L4T-linux-x64_b39 setup on it.
  • use ethernet to connect to network.

test case:

  • command provided by Nvidia release doc
  • use this command to start more 30 video steaming from network.
  • video steaming format, H264, 640x360, 29fps

===============================
H.264 Decode (NVIDIA accelerated decode)
gst-launch-1.0 filesrc location=<filename.mp4> ! qtdemux name=demux
demux.video_0 ! queue ! h264parse ! omxh264dec ! nveglglessink -e

[url]https://developer.download.nvidia.com/embedded/L4T/r28_Release_v2.0/DP/Docs/Jetson_TX1_and_TX2_Accelerated_GStreamer_User_Guide.pdf?yaTAOyLxXuyzU_TjT4jDQKX6hl3x2xDvv9xBZP7gBd8ILuYDcexDR0_wHbTwpeB46WnZy_G_uXKGTiu7U2ixsS07XcFes5sx7RCoNtg4KGGpUfln0xO67oXb80HHgKusiNtHI04YbRdfslFhHacZsVHMEY4P5VNiV5UzQm8g8MA7nRdoVlivHvJbPVwZUPG_vv3CltiqneIUlgIR[/url]

test script:

c=1
while [ $c -le 5 ]
do
echo “open video”
free -m
gst-launch-1.0 http://www.html5videoplayer.net/videos/toystory.mp4 ! qtdemux name=demux.video_0 ! queue ! h264parse ! omxh264dec ! nveglglessink -e &
sleep 5s;

done

test result

  1. from opening 3rd video steaming, there will be a warming show
    WARNING: from element /GstPipeline:pipeline0/GstEglGlesSink:eglglessink0: A lot of buffers are being dropped.
    Additional debug info:
    gstbasesink.c(2854): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:eglglessink0:
    There may be a timestamping problem, or this computer is too slow.

  2. over 20 video steaming, the cpu, gpu usage are the following.
    RAM 2444/7846MB (lfb 1194x4MB) CPU [76%@1248,off,off,73%@1257,72%@1263,70%@1255] EMC_FREQ 13%@1600 GR3D_FREQ 77%@140 NVDEC 1164 APE 150 BCPU@37C MCPU@37C GPU@35.5C PLL@37C Tboard@33C Tdiode@34.5C PMIC@100C thermal@36.4C VDD_IN 6011/3914 VDD_CPU 1065/437 VDD_GPU 304/203 VDD_SOC 1673/1106 VDD_WIFI 0/10 VDD_DDR 1459/854

=======================================

thank you

Hi,
Please execute jetson_clocks.sh and run
gst-launch-1.0 filesrc location=<filename.mp4> ! qtdemux name=demux.video_0 ! queue ! h264parse ! omxh264dec ! fakesink -e &

It reads local video file to eliminate effect of network bandwidth, use fakesink to eliminate effect of rendering, and run in max performance.

Hi,

update test result

from local file, after opened 3 steaming from local file, the cpus are busy.
and this result is opened 30 steaming from local file all hw status.

questions:

  1. execute the jetson_clocks.sh, I can’t see cpu all up to work, only fan speed is increased.
  2. from test result, the cpu still very busy, and it will not the major impact by network on rendering. any other idea about this issue?

test result

RAM 3311/15819MB (lfb 2932x4MB) CPU [92%@1190,90%@1190,88%@1190,88%@1190,off,off,off,off] EMC_FREQ 0% GR3D_FREQ 0% AO@27C GPU@27C Tboard@28C Tdiode@31.25C AUX@25.5C CPU@28C thermal@26.7C PMIC@100C GPU 616/624 CPU 1388/784 SOC 3393/1591 CV 0/0 VDDRQ 462/312 SYS5V 2254/2161

test environment

  • Xavier platform with JesPack-L4T-linux-x64_b39 setup on it.
    ===============================================

test case

  • command provided deptalk
  • execute jetson_clocks.sh before test start
  • use this command to start more 30 video steaming from local file.
  • video steaming format, H264, 640x360, 29fps
    ===============================================

test script

x=1
while [ $x -le 30 ]
do
echo “Welcome $x times”
gst-launch-1.0 filesrc location=toystory.mp4 ! qtdemux name=demux.video_0 ! queue ! h264parse ! omxh264dec ! fakesink -e &
x=$(( $x + 1 ))
sleep 1s;
done

Hi,
We have verified simultaneous three 4kp30 video playback. Your case is 30 instances of 640x360p29 video playback, which is different.

Please try to run

sudo nvpmodel -m 0

to get all CPUs online
And check tegrastats

sudo ./tegrastats

If it still does not achieve the performance, please try to flash MaxN config:
https://devtalk.nvidia.com/default/topic/1044093

update test result and correct info

our goal is looking for 50 video steaming with 1080p 60fps as input and run some AI on it.
not only decode video steaming, sorry for misleading.

after open all cpu, the performance is better, and then we start to test it with 1080p 25fps local video file.

 we found after we open more than 13 1080p 25fps video steaming, the screen will be freeze,
 but from the tool tegrastats, the cpu, gpu and memory parts looks okay, they don't reach the high pick.

questions:

  1. any other tool could help me to dig out what is the bottleneck?
  2. flash MaxN from the article, it will modify the thermal policy, do we encounter thermal issue if we use this way to improve the performance?

test result

RAM 5769/15819MB (lfb 2108x4MB) CPU [58%@2265,60%@2265,61%@2265,59%@2265,65%@2265,60%@2265,62%@2265,61%@2265] EMC_FREQ 0% GR3D_FREQ 0% AO@29C GPU@29C Tboard@29C Tdiode@33C AUX@28C CPU@33C thermal@29.65C PMIC@100C GPU 461/804 CPU 7382/2871 SOC 3846/3152 CV 0/0 VDDRQ 615/600 SYS5V 2334/2303
RAM 5770/15819MB (lfb 2108x4MB) CPU [62%@2265,62%@2265,60%@2265,60%@2265,57%@2265,57%@2265,59%@2265,62%@2265] EMC_FREQ 0% GR3D_FREQ 0% AO@29C GPU@29C Tboard@29C Tdiode@33C AUX@28C CPU@32.5C thermal@29.85C PMIC@100C GPU 461/801 CPU 7074/2908 SOC 3692/3157 CV 0/0 VDDRQ 615/600 SYS5V 2294/2303

test environment

  • Xavier platform with JesPack-L4T-linux-x64_b39 setup on it.
    ===============================================

test case

  • command provided deptalk
  • execute jetson_clocks.sh before test start
  • sudo nvpmodel -m 0 before test
  • use this command to start more 13 video steaming from local file.
  • video steaming format, H264, 1080p, 25fps
    ===============================================

test script

x=1
while [ $x -le 12 ]
do
echo “Welcome $x times”
gst-launch-1.0 filesrc location=TextInMotion-VideoSample-1080p.mp4 ! qtdemux name=demux.video_0 ! queue ! h264parse ! omxh264dec ! nveglglessink -e &
x=$(( $x + 1 ))
sleep 1s;
done

Hi,
We will check if Xavier can achieve 50 1080p60 decoding instances.

You would also use sudo for tegrastats, so that all information is displayed, including HW encoder/decoder information. You may then better see any bottleneck.

Hi hobin0920,
Internally, we have verified 16 instances of 1080p30 10Mbps decoding(with fakesink) and theoretically it can achieve 32 instances.

It does not achieve your requirement. If it is acceptable for your usecase and you would like to continue, please contact NVIDIA salesperson and have further cooperation with us.