performance limitation in backend multimedia api sample

I’m using backend of L4T Multimedia API Samples to decode and render 4 channels of h264 1080p video in Xavier.

There is a problem with limited performance.

According to backend of L4T Multimedia API Reference, it is “object detection is limited to identifying cars in video streams of 960 x 540 resolution, running up to 14 FPS”.

I want to running up to 30 fps each channels.
If it’s not a performance problem for xavier, is there a way to lift that limit?

Hi,
The data is acquired on TX2. On Xavier, you shall get better performance.

Thank you for the answer.
but, that result do not match that in this link Xavier & TX2 Comparison - Connect Tech Inc..
There it is written that xavier is capable of (6x) 4Kp60 video decode. However, in the backend example, a delay occurs at (4x) 1080p.
Is it normal performance in xavier?

Hi,
For pure video decoding performance, please run 00_video_decode. Backend sample run deep-learning model on GPU and the performance is decided by the model.

Finally,

Why does the 4 channel input show stuttering during playback.

From tegrastat it doesn’t seem to run out of performance, but I wonder why.

–command–
I set “ENABLETRT ?= 0”

./backend 4 …/…/…/126/AVB_20190531150055_00001_ch01.h264 …/…/…/126/AVB_20190531150055_00001_ch02.h264 …/…/…/126/AVB_20190531150055_00001_ch03.h264 …/…/…/126/AVB_20190531150055_00001_ch04.h264 H264
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
[INFO] (NvEglRenderer.cpp:110) Setting Screen width 480 height 270
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
[INFO] (NvEglRenderer.cpp:110) Setting Screen width 480 height 270
[INFO] (NvEglRenderer.cpp:110) Setting Screen width 480 height 270
Starting decoder capture loop thread
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (12) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (12) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
[INFO] (NvEglRenderer.cpp:110) Setting Screen width 480 height 270
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (12) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (12) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (12) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (12) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (12) CAPTURE PLANE BUFFERS Layout=0
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (12) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
Query and set capture successful

–tegrastat–

RAM 6469/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [19%@2265,23%@2265,23%@2265,19%@2265,13%@2265,18%@2265,20%@2265,19%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 11%@1377 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 1% bg 13% AO@33.5C GPU@35C Tboard@36C Tdiode@38.75C AUX@34.5C CPU@37C thermal@35.4C PMIC@100C GPU 1698/1364 CPU 2162/1886 SOC 4325/3169 CV 0/0 VDDRQ 463/257 SYS5V 2652/2584
RAM 6469/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [15%@2265,11%@2265,15%@2265,11%@2265,11%@2265,11%@2265,10%@2265,9%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 10%@1377 NVDEC 601 NVDEC1 601 APE 150 MTS fg 1% bg 12% AO@34C GPU@35.5C Tboard@36C Tdiode@39C AUX@34.5C CPU@37.5C thermal@35.55C PMIC@100C GPU 1698/1364 CPU 2316/1887 SOC 4478/3171 CV 0/0 VDDRQ 463/258 SYS5V 2652/2585
RAM 6469/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [12%@2265,11%@2265,12%@2265,10%@2265,8%@2265,12%@2265,8%@2265,11%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 11%@1377 NVDEC 332 NVDEC1 307 APE 150 MTS fg 0% bg 14% AO@34C GPU@35.5C Tboard@36C Tdiode@39C AUX@34.5C CPU@37.5C thermal@35.4C PMIC@100C GPU 1698/1365 CPU 2007/1887 SOC 4325/3172 CV 0/0 VDDRQ 463/258 SYS5V 2652/2585
RAM 6469/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [11%@2265,12%@2265,14%@2265,11%@2265,5%@2265,5%@2265,10%@2265,10%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 10%@1377 NVDEC 192 NVDEC1 192 APE 150 MTS fg 1% bg 13% AO@34C GPU@35C Tboard@36C Tdiode@39C AUX@34.5C CPU@37C thermal@35.55C PMIC@100C GPU 1698/1365 CPU 2007/1887 SOC 4325/3174 CV 0/0 VDDRQ 463/258 SYS5V 2652/2585
RAM 6471/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [12%@2265,8%@2265,13%@2265,11%@2265,10%@2265,7%@2265,6%@2265,7%@2219] EMC_FREQ 4%@2133 GR3D_FREQ 8%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 1% bg 15% AO@33.5C GPU@35.5C Tboard@36C Tdiode@39C AUX@34.5C CPU@37.5C thermal@35.7C PMIC@100C GPU 1698/1366 CPU 2007/1887 SOC 4325/3176 CV 0/0 VDDRQ 463/259 SYS5V 2652/2585
RAM 6471/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [10%@2265,13%@2265,13%@2265,8%@2265,8%@2265,5%@2265,6%@2265,5%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 13%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 1% bg 13% AO@34C GPU@35.5C Tboard@36C Tdiode@39C AUX@34.5C CPU@37.5C thermal@35.55C PMIC@100C GPU 1699/1366 CPU 1853/1887 SOC 4325/3177 CV 0/0 VDDRQ 308/259 SYS5V 2652/2585
RAM 6471/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [9%@2265,9%@2265,9%@2265,9%@2265,7%@2265,9%@2265,8%@2265,11%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 11%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 0% bg 14% AO@34C GPU@35.5C Tboard@36C Tdiode@39.25C AUX@34.5C CPU@37.5C thermal@35.55C PMIC@100C GPU 1699/1367 CPU 1853/1887 SOC 4325/3179 CV 0/0 VDDRQ 308/259 SYS5V 2652/2585
RAM 6472/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [11%@2265,10%@2265,10%@2265,10%@2265,6%@2265,8%@2265,7%@2265,7%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 10%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 0% bg 12% AO@34C GPU@35.5C Tboard@36C Tdiode@39.25C AUX@34.5C CPU@37.5C thermal@35.7C PMIC@100C GPU 1698/1367 CPU 2007/1887 SOC 4325/3181 CV 0/0 VDDRQ 463/259 SYS5V 2652/2585
RAM 6470/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [14%@2265,8%@2265,12%@2265,11%@2265,4%@2265,7%@2265,9%@2265,4%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 12%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 1% bg 13% AO@34C GPU@35.5C Tboard@36C Tdiode@39.25C AUX@34.5C CPU@37C thermal@35.55C PMIC@100C GPU 1699/1368 CPU 1853/1887 SOC 4325/3182 CV 0/0 VDDRQ 308/259 SYS5V 2652/2585
RAM 6470/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [10%@2265,14%@2265,11%@2265,10%@2265,9%@2265,9%@2265,7%@2265,9%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 11%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 1% bg 18% AO@34C GPU@35.5C Tboard@36C Tdiode@39.25C AUX@34.5C CPU@37.5C thermal@35.7C PMIC@100C GPU 1698/1368 CPU 2162/1888 SOC 4325/3184 CV 0/0 VDDRQ 463/259 SYS5V 2652/2585
RAM 6470/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [10%@2265,9%@2265,6%@2265,13%@2265,6%@2265,6%@2265,8%@2265,8%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 7%@1377 NVDE

Hi,
You may try the following case:
[url]https://devtalk.nvidia.com/default/topic/1014789/jetson-tx1/-the-cpu-usage-cannot-down-use-cuda-decode-/post/5189145/#5189145[/url]

This works fine when running in four processes.

./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264

However, there is a video delay issue when running on 4 channel threads within a process.
It seems to block when dqbuffer on the decoder’s output plane.
Do you know why?
Please let me know what you need to fix the problem.


./backend 4 …/…/…/126/AVB_20190531150055_00001_ch01.h264 …/…/…/126/AVB_20190531150055_00001_ch02.h264 …/…/…/126/AVB_20190531150055_00001_ch03.h264 …/…/…/126/AVB_20190531150055_00001_ch04.h264 H264

Please let me know if you have any doubts.

Hi,
We suggest you run single encoding thread in each process.
[url]https://devtalk.nvidia.com/default/topic/1056389/jetson-agx-xavier/error-encoding-with-gstreamer-and-omxh264enc/post/5362732/#5362732[/url]