performance limitation in backend multimedia api sample

famoson4 · October 17, 2019, 3:47pm

I’m using backend of L4T Multimedia API Samples to decode and render 4 channels of h264 1080p video in Xavier.

There is a problem with limited performance.

According to backend of L4T Multimedia API Reference, it is “object detection is limited to identifying cars in video streams of 960 x 540 resolution, running up to 14 FPS”.

I want to running up to 30 fps each channels.
If it’s not a performance problem for xavier, is there a way to lift that limit?

DaneLLL · October 18, 2019, 9:59am

Hi,
The data is acquired on TX2. On Xavier, you shall get better performance.

famoson4 · October 20, 2019, 6:08pm

Thank you for the answer.
but, that result do not match that in this link Xavier & TX2 Comparison - Connect Tech Inc..
There it is written that xavier is capable of (6x) 4Kp60 video decode. However, in the backend example, a delay occurs at (4x) 1080p.
Is it normal performance in xavier?

DaneLLL · October 21, 2019, 1:40am

Hi,
For pure video decoding performance, please run 00_video_decode. Backend sample run deep-learning model on GPU and the performance is decided by the model.

famoson4 · October 21, 2019, 7:46am

Finally,

Why does the 4 channel input show stuttering during playback.

From tegrastat it doesn’t seem to run out of performance, but I wonder why.

–command–
I set “ENABLETRT ?= 0”

./backend 4 …/…/…/126/AVB_20190531150055_00001_ch01.h264 …/…/…/126/AVB_20190531150055_00001_ch02.h264 …/…/…/126/AVB_20190531150055_00001_ch03.h264 …/…/…/126/AVB_20190531150055_00001_ch04.h264 H264
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
[INFO] (NvEglRenderer.cpp:110) Setting Screen width 480 height 270
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
[INFO] (NvEglRenderer.cpp:110) Setting Screen width 480 height 270
[INFO] (NvEglRenderer.cpp:110) Setting Screen width 480 height 270
Starting decoder capture loop thread
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (12) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (12) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
[INFO] (NvEglRenderer.cpp:110) Setting Screen width 480 height 270
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (12) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (12) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (12) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (12) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (12) CAPTURE PLANE BUFFERS Layout=0
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (12) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
Query and set capture successful

–tegrastat–

RAM 6469/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [19%@2265,23%@2265,23%@2265,19%@2265,13%@2265,18%@2265,20%@2265,19%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 11%@1377 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 1% bg 13% AO@33.5C GPU@35C Tboard@36C Tdiode@38.75C AUX@34.5C CPU@37C thermal@35.4C PMIC@100C GPU 1698/1364 CPU 2162/1886 SOC 4325/3169 CV 0/0 VDDRQ 463/257 SYS5V 2652/2584
RAM 6469/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [15%@2265,11%@2265,15%@2265,11%@2265,11%@2265,11%@2265,10%@2265,9%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 10%@1377 NVDEC 601 NVDEC1 601 APE 150 MTS fg 1% bg 12% AO@34C GPU@35.5C Tboard@36C Tdiode@39C AUX@34.5C CPU@37.5C thermal@35.55C PMIC@100C GPU 1698/1364 CPU 2316/1887 SOC 4478/3171 CV 0/0 VDDRQ 463/258 SYS5V 2652/2585
RAM 6469/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [12%@2265,11%@2265,12%@2265,10%@2265,8%@2265,12%@2265,8%@2265,11%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 11%@1377 NVDEC 332 NVDEC1 307 APE 150 MTS fg 0% bg 14% AO@34C GPU@35.5C Tboard@36C Tdiode@39C AUX@34.5C CPU@37.5C thermal@35.4C PMIC@100C GPU 1698/1365 CPU 2007/1887 SOC 4325/3172 CV 0/0 VDDRQ 463/258 SYS5V 2652/2585
RAM 6469/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [11%@2265,12%@2265,14%@2265,11%@2265,5%@2265,5%@2265,10%@2265,10%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 10%@1377 NVDEC 192 NVDEC1 192 APE 150 MTS fg 1% bg 13% AO@34C GPU@35C Tboard@36C Tdiode@39C AUX@34.5C CPU@37C thermal@35.55C PMIC@100C GPU 1698/1365 CPU 2007/1887 SOC 4325/3174 CV 0/0 VDDRQ 463/258 SYS5V 2652/2585
RAM 6471/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [12%@2265,8%@2265,13%@2265,11%@2265,10%@2265,7%@2265,6%@2265,7%@2219] EMC_FREQ 4%@2133 GR3D_FREQ 8%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 1% bg 15% AO@33.5C GPU@35.5C Tboard@36C Tdiode@39C AUX@34.5C CPU@37.5C thermal@35.7C PMIC@100C GPU 1698/1366 CPU 2007/1887 SOC 4325/3176 CV 0/0 VDDRQ 463/259 SYS5V 2652/2585
RAM 6471/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [10%@2265,13%@2265,13%@2265,8%@2265,8%@2265,5%@2265,6%@2265,5%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 13%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 1% bg 13% AO@34C GPU@35.5C Tboard@36C Tdiode@39C AUX@34.5C CPU@37.5C thermal@35.55C PMIC@100C GPU 1699/1366 CPU 1853/1887 SOC 4325/3177 CV 0/0 VDDRQ 308/259 SYS5V 2652/2585
RAM 6471/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [9%@2265,9%@2265,9%@2265,9%@2265,7%@2265,9%@2265,8%@2265,11%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 11%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 0% bg 14% AO@34C GPU@35.5C Tboard@36C Tdiode@39.25C AUX@34.5C CPU@37.5C thermal@35.55C PMIC@100C GPU 1699/1367 CPU 1853/1887 SOC 4325/3179 CV 0/0 VDDRQ 308/259 SYS5V 2652/2585
RAM 6472/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [11%@2265,10%@2265,10%@2265,10%@2265,6%@2265,8%@2265,7%@2265,7%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 10%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 0% bg 12% AO@34C GPU@35.5C Tboard@36C Tdiode@39.25C AUX@34.5C CPU@37.5C thermal@35.7C PMIC@100C GPU 1698/1367 CPU 2007/1887 SOC 4325/3181 CV 0/0 VDDRQ 463/259 SYS5V 2652/2585
RAM 6470/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [14%@2265,8%@2265,12%@2265,11%@2265,4%@2265,7%@2265,9%@2265,4%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 12%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 1% bg 13% AO@34C GPU@35.5C Tboard@36C Tdiode@39.25C AUX@34.5C CPU@37C thermal@35.55C PMIC@100C GPU 1699/1368 CPU 1853/1887 SOC 4325/3182 CV 0/0 VDDRQ 308/259 SYS5V 2652/2585
RAM 6470/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [10%@2265,14%@2265,11%@2265,10%@2265,9%@2265,9%@2265,7%@2265,9%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 11%@1377 NVDEC 128 NVDEC1 128 APE 150 MTS fg 1% bg 18% AO@34C GPU@35.5C Tboard@36C Tdiode@39.25C AUX@34.5C CPU@37.5C thermal@35.7C PMIC@100C GPU 1698/1368 CPU 2162/1888 SOC 4325/3184 CV 0/0 VDDRQ 463/259 SYS5V 2652/2585
RAM 6470/15690MB (lfb 243x4MB) SWAP 0/7845MB (cached 0MB) CPU [10%@2265,9%@2265,6%@2265,13%@2265,6%@2265,6%@2265,8%@2265,8%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 7%@1377 NVDE

DaneLLL · October 21, 2019, 9:20am

Hi,
You may try the following case:
[url]https://devtalk.nvidia.com/default/topic/1014789/jetson-tx1/-the-cpu-usage-cannot-down-use-cuda-decode-/post/5189145/#5189145[/url]

famoson4 · October 23, 2019, 5:17am

This works fine when running in four processes.

./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264

However, there is a video delay issue when running on 4 channel threads within a process.
It seems to block when dqbuffer on the decoder’s output plane.
Do you know why?
Please let me know what you need to fix the problem.

./backend 4 …/…/…/126/AVB_20190531150055_00001_ch01.h264 …/…/…/126/AVB_20190531150055_00001_ch02.h264 …/…/…/126/AVB_20190531150055_00001_ch03.h264 …/…/…/126/AVB_20190531150055_00001_ch04.h264 H264

famoson4 · October 25, 2019, 7:36am

Please let me know if you have any doubts.

DaneLLL · October 25, 2019, 7:44am

Hi,
We suggest you run single encoding thread in each process.
[url]https://devtalk.nvidia.com/default/topic/1056389/jetson-agx-xavier/error-encoding-with-gstreamer-and-omxh264enc/post/5362732/#5362732[/url]

Topic		Replies	Views
[Jetson Xavier]Hardware video decode doesn't work Jetson AGX Xavier	12	1150	May 27, 2019
Jetson AGX h.265 encode latency Jetson AGX Xavier mmapi	12	815	May 23, 2024
Xavier gstreamer decode performance issue Jetson AGX Xavier	8	1496	October 18, 2021
Video Encoding Decoding Capability Jetson AGX Xavier	13	1545	September 29, 2023
How much cameras AXG Xavier can processed? Jetson AGX Xavier camera , gstreamer	33	3353	October 18, 2021
The highest refresh rate Monitor Xavier can support Jetson AGX Xavier hdmi	41	1518	June 22, 2022
Question on H264/H265 encoding configuration Jetson Xavier NX encoder	4	393	April 9, 2024
xavier encode and decode do not match official description Jetson AGX Xavier	3	1122	October 18, 2021
Xavier NX hardware JPEG encoder extremely slow Jetson Xavier NX encoder	11	1059	February 21, 2023
Xaiver nvdec decode performance Jetson AGX Xavier decoder	9	2650	October 18, 2021

performance limitation in backend multimedia api sample

This works fine when running in four processes.

./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264 & ./backend 1 ./126/AVB_20190531150055_00001_ch04.h264 H264

./backend 4 …/…/…/126/AVB_20190531150055_00001_ch01.h264 …/…/…/126/AVB_20190531150055_00001_ch02.h264 …/…/…/126/AVB_20190531150055_00001_ch03.h264 …/…/…/126/AVB_20190531150055_00001_ch04.h264 H264

Related topics