[Jetson Xavier]Hardware video decode doesn't work

When I using L4T JAX and TX2 R32.1 Multimedia API
tegra_multimedia_api/samples/00_video_decode sample code to decode the h264 video stream to raw yuv

[code]./video_decode H264 -f 2 --input-nalu --disable-rendering --blocking-mode 0 -o ttt.yuv t50.h264[code]

I found the CPU load about 100%,

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17744 nvidia 20 0 291388 23164 14928 R 102.6 0.1 0:04.88 video_decode

I think the hardware decode doesn’t work when decode h264 video stream .

Pls help me to fix the issue. Thanks!

We downloaded sample code from Jetson Download Center | NVIDIA Developer .

Hi,
Please run tegrastats to get precise CPU loading.

sudo tegrastats

Hi,
I use the command sudo tegrastats to check
here is the result.

RAM 3519/15692MB (lfb 2022x4MB) CPU [3%@2265,0%@2265,0%@2265,0%@2265,0%@2265,0%@2265,2%@2265,2%@2265] EMC_FREQ 0%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@42.5C GPU@43.5C Tboard@44C Tdiode@45.75C AUX@42C CPU@43.5C thermal@42.3C PMIC@100C GPU 0/0 CPU 621/621 SOC 932/932 CV 0/0 VDDRQ 155/155 SYS5V 1655/1655 RAM 3500/15692MB (lfb 2021x4MB) CPU [30%@2265,4%@2265,2%@2265,0%@2265,33%@2265,4%@2265,1%@2265,0%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 3% AO@42.5C GPU@44C Tboard@44C Tdiode@46C AUX@42C CPU@44C thermal@42.6C PMIC@100C GPU 155/77 CPU 2015/1318 SOC 3875/2403 CV 0/0 VDDRQ 465/310 SYS5V 2378/2016 RAM 3507/15692MB (lfb 2021x4MB) CPU [7%@2265,5%@2265,2%@2265,1%@2265,66%@2265,18%@2265,13%@2265,2%@2265] EMC_FREQ 3%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 3% AO@43C GPU@44.5C Tboard@44C Tdiode@46C AUX@42.5C CPU@44.5C thermal@43.5C PMIC@100C GPU 0/51 CPU 2324/1653 SOC 4646/3151 CV 0/0 VDDRQ 464/361 SYS5V 2459/2164 RAM 3512/15692MB (lfb 2021x4MB) CPU [8%@2265,4%@2265,9%@2265,2%@2265,55%@2265,28%@2265,2%@2265,11%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 4% AO@42.5C GPU@45C Tboard@44C Tdiode@46C AUX@42.5C CPU@45C thermal@43.7C PMIC@100C GPU 0/38 CPU 2479/1859 SOC 4646/3524 CV 0/0 VDDRQ 464/387 SYS5V 2459/2237 RAM 3519/15692MB (lfb 2021x4MB) CPU [12%@2265,8%@2265,1%@2265,2%@2265,58%@2265,23%@2265,4%@2265,6%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 3% AO@43C GPU@44.5C Tboard@44C Tdiode@46.25C AUX@42.5C CPU@44.5C thermal@44.2C PMIC@100C GPU 0/31 CPU 2324/1952 SOC 4493/3718 CV 0/0 VDDRQ 464/402 SYS5V 2419/2274 RAM 3526/15692MB (lfb 2021x4MB) CPU [8%@2265,5%@2265,1%@2265,1%@2265,53%@2265,34%@2265,2%@2265,9%@2265] EMC_FREQ 4%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 3% AO@43C GPU@45C Tboard@44C Tdiode@46.25C AUX@43C CPU@45C thermal@44C PMIC@100C GPU 0/25 CPU 2324/2014 SOC 4493/3847 CV 0/0 VDDRQ 464/412 SYS5V 2419/2298 RAM 3528/15692MB (lfb 2021x4MB) CPU [3%@2265,75%@2265,0%@2265,0%@2265,9%@2265,17%@2265,1%@2265,1%@2265] EMC_FREQ 2%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 2% AO@43C GPU@45.5C Tboard@44C Tdiode@46.25C AUX@43C CPU@45.5C thermal@44.5C PMIC@100C GPU 0/22 CPU 2479/2080 SOC 3563/3806 CV 0/0 VDDRQ 310/398 SYS5V 3144/2419 RAM 3528/15692MB (lfb 2021x4MB) CPU [0%@2265,100%@2265,0%@2265,0%@2265,0%@2265,2%@2265,0%@2265,0%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@43C GPU@45.5C Tboard@44C Tdiode@46.5C AUX@43C CPU@45.5C thermal@44.5C PMIC@100C GPU 0/19 CPU 2480/2130 SOC 2790/3679 CV 0/0 VDDRQ 155/367 SYS5V 3265/2524 RAM 3528/15692MB (lfb 2021x4MB) CPU [1%@2265,100%@2265,0%@2265,0%@2265,0%@2265,2%@2265,0%@2265,0%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@43.5C GPU@45.5C Tboard@44C Tdiode@46.5C AUX@43C CPU@45.5C thermal@44.5C PMIC@100C GPU 0/17 CPU 2480/2169 SOC 2790/3580 CV 0/0 VDDRQ 310/361 SYS5V 3265/2607 RAM 3528/15692MB (lfb 2021x4MB) CPU [2%@2265,73%@2265,0%@2265,0%@2265,18%@2265,4%@2265,0%@2265,0%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@43.5C GPU@44.5C Tboard@44C Tdiode@46.25C AUX@43C CPU@44.5C thermal@44.5C PMIC@100C GPU 0/15 CPU 2325/2185 SOC 2790/3501 CV 0/0 VDDRQ 310/356 SYS5V 3265/2672 RAM 3528/15692MB (lfb 2021x4MB) CPU [0%@2265,0%@2265,0%@2265,0%@2265,100%@2265,3%@2265,0%@2265,0%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@43.5C GPU@44.5C Tboard@44C Tdiode@46.5C AUX@43C CPU@44.5C thermal@43.9C PMIC@100C GPU 0/14 CPU 2480/2211 SOC 2790/3437 CV 0/0 VDDRQ 155/337 SYS5V 3306/2730 RAM 3528/15692MB (lfb 2021x4MB) CPU [0%@2265,0%@2265,0%@2265,0%@2265,100%@2265,4%@2265,0%@2265,1%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@43.5C GPU@44.5C Tboard@44C Tdiode@46.5C AUX@43C CPU@44.5C thermal@43.9C PMIC@100C GPU 0/12 CPU 2480/2234 SOC 2790/3383 CV 0/0 VDDRQ 155/322 SYS5V 3265/2774 RAM 3528/15692MB (lfb 2021x4MB) CPU [0%@2265,0%@2265,0%@2265,0%@2265,100%@2265,2%@2265,0%@2265,0%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@43C GPU@44.5C Tboard@44C Tdiode@46.5C AUX@43C CPU@44.5C thermal@44.2C PMIC@100C GPU 0/11 CPU 2480/2253 SOC 2790/3337 CV 0/0 VDDRQ 155/309 SYS5V 3225/2809 RAM 3528/15692MB (lfb 2021x4MB) CPU [1%@2265,0%@2265,0%@2265,0%@2265,100%@2265,2%@2265,0%@2265,1%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@43.5C GPU@45C Tboard@44C Tdiode@46.5C AUX@43C CPU@45C thermal@43.9C PMIC@100C GPU 0/11 CPU 2480/2269 SOC 2790/3298 CV 0/0 VDDRQ 310/309 SYS5V 3265/2842 RAM 3528/15692MB (lfb 2021x4MB) CPU [0%@2265,0%@2265,0%@2265,0%@2265,100%@2265,2%@2265,0%@2265,0%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@43.5C GPU@44.5C Tboard@44C Tdiode@46.75C AUX@43C CPU@44.5C thermal@43.9C PMIC@100C GPU 0/10 CPU 2480/2283 SOC 2790/3264 CV 0/0 VDDRQ 155/299 SYS5V 3225/2867 RAM 3521/15692MB (lfb 2020x4MB) CPU [4%@1190,2%@1190,1%@1190,0%@1190,90%@1190,2%@1264,4%@1283,1%@1420] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@43.5C GPU@44.5C Tboard@44C Tdiode@46.75C AUX@43C CPU@44.5C thermal@44.2C PMIC@100C GPU 155/19 CPU 2480/2295 SOC 2635/3225 CV 0/0 VDDRQ 310/300 SYS5V 3346/2897 RAM 3521/15692MB (lfb 2020x4MB) CPU [4%@1190,0%@1190,0%@1190,0%@1190,33%@1267,0%@1267,0%@1552,0%@1574] EMC_FREQ 5%@408 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@43.5C GPU@44C Tboard@44C Tdiode@46.75C AUX@43C CPU@44C thermal@43.6C PMIC@100C GPU 155/27 CPU 931/2215 SOC 1397/3117 CV 0/0 VDDRQ 155/291 SYS5V 2099/2850 RAM 3521/15692MB (lfb 2020x4MB) CPU [1%@1190,0%@1190,0%@1190,0%@1190,0%@1265,0%@1267,0%@1478,0%@1497] EMC_FREQ 5%@408 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@43C GPU@44C Tboard@44C Tdiode@46.5C AUX@43C CPU@44C thermal@43.6C PMIC@100C GPU 0/25 CPU 466/2118 SOC 932/2996 CV 0/0 VDDRQ 155/283 SYS5V 1615/2781 RAM 3522/15692MB (lfb 2020x4MB) CPU [5%@1190,0%@1190,2%@1190,0%@1190,0%@1190,0%@1362,2%@1420,0%@1420] EMC_FREQ 5%@408 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@43C GPU@44C Tboard@44C Tdiode@46.5C AUX@43C CPU@44C thermal@43.4C PMIC@100C GPU 0/24 CPU 621/2039 SOC 932/2887 CV 0/0 VDDRQ 155/277 SYS5V 1615/2720 RAM 3522/15692MB (lfb 2020x4MB) CPU [2%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1310,0%@1343,0%@1549] EMC_FREQ 5%@408 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@43C GPU@43.5C Tboard@44C Tdiode@46.25C AUX@42.5C CPU@43.5C thermal@43.1C PMIC@100C GPU 0/23 CPU 466/1960 SOC 932/2789 CV 0/0 VDDRQ 155/271 SYS5V 1574/2663

Hi,
Hardware decoding engine, NVDEC and NVDEC1, is running.

You can get more about tegrastats at
[url]https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%2520Linux%2520Driver%2520Package%2520Development%2520Guide%2FAppendixTegraStats.html[/url]

Hi,

I saw the log .

RAM 3528/15692MB (lfb 2021x4MB) CPU [0%@2265,100%@2265
,0%@2265,0%@2265,0%@2265,2%@2265,0%@2265,0%@2265] EMC_FREQ 1%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@43C GPU@45.5C Tboard@44C Tdiode@46.5C AUX@43C CPU@45.5C thermal@44.5C PMIC@100C

The cpu load is too high .I think when using hw decode video the CPU load will at 10%~15%@CPU .

Could you give us a benchmark of CPU load when using the Hardware decoding engine, NVDEC and NVDEC1 ?

Hi,
The high CPU loading is due to writing YUVs to a file(-o ttt.yuv). You can compare

$ ./video_decode H264 -o a.yuv --disable-rendering ../../data/Video/sample_outdoor_car_1080p_10fps.h264
$ ./video_decode H264 --stats --disable-rendering ../../data/Video/sample_outdoor_car_1080p_10fps.h264

I use the

./video_decode H264 --stats --blocking-mode 0 --disable-rendering ../../data/Video/sample_outdoor_car_1080p_10fps.h264
RAM 3809/15692MB (lfb 1638x4MB) CPU [0%@2265,0%@2265,42%@2265,48%@2265,0%@2265,0%@2265,0%@2265,0%@2265] EMC_FREQ 0%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@44C GPU@45.5C Tboard@45C Tdiode@47.5C AUX@43.5C CPU@45.5C thermal@44.9C PMIC@100C GPU 0/31 CPU 2791/1954 SOC 2791/2233 CV 0/0 VDDRQ 155/155 SYS5V 2298/2065
RAM 3809/15692MB (lfb 1638x4MB) CPU [0%@2265,0%@2265,43%@2265,47%@2265,0%@2265,0%@2265,0%@2265,0%@2265] EMC_FREQ 0%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@44.5C GPU@46C Tboard@45C Tdiode@47.5C AUX@44C CPU@46C thermal@45.2C PMIC@100C GPU 0/25 CPU 2791/2093 SOC 2791/2326 CV 0/0 VDDRQ 155/155 SYS5V 2298/2103
RAM 3809/15692MB (lfb 1638x4MB) CPU [0%@2265,0%@2265,41%@2265,46%@2265,0%@2265,0%@2265,0%@2265,0%@2265] EMC_FREQ 0%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@44.5C GPU@46C Tboard@45C Tdiode@47.5C AUX@44C CPU@46C thermal@45.2C PMIC@100C GPU 0/22 CPU 2791/2193 SOC 2791/2392 CV 0/0 VDDRQ 155/155 SYS5V 2298/2131
RAM 3809/15692MB (lfb 1638x4MB) CPU [1%@2265,0%@2265,44%@2265,43%@2265,0%@2265,0%@2265,0%@2265,0%@2265] EMC_FREQ 0%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@44.5C GPU@46C Tboard@45C Tdiode@47.75C AUX@44C CPU@46C thermal@45.2C PMIC@100C GPU 0/19 CPU 2791/2268 SOC 2791/2442 CV 0/0 VDDRQ 155/155 SYS5V 2298/2152
RAM 3809/15692MB (lfb 1638x4MB) CPU [0%@2265,1%@2265,42%@2265,47%@2265,0%@2265,0%@2265,0%@2265,0%@2265] EMC_FREQ 0%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 1% AO@44.5C GPU@46C Tboard@45C Tdiode@47.75C AUX@44C CPU@46C thermal@45.4C PMIC@100C GPU 0/17 CPU 2791/2326 SOC 2791/2481 CV 0/0 VDDRQ 155/155 SYS5V 2298/2168
RAM 3809/15692MB (lfb 1638x4MB) CPU [2%@2265,0%@2265,39%@2265,51%@2265,0%@2265,0%@2265,0%@2265,0%@2265] EMC_FREQ 0%@2133 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 0% AO@44.5C GPU@46C Tboard@45C Tdiode@47.5C AUX@44C CPU@46C thermal@45.2C PMIC@100C GPU 0/15 CPU 2791/2372 SOC 2791/2512 CV 0/0 VDDRQ 155/155 SYS5V 2298/2181
./video_decode H264 --stats  -s 20  --disable-rendering ../../data/Video/sample_outdoor_car_1080p_10fps.h264
RAM 3867/15692MB (lfb 1624x4MB) CPU [15%@1190,2%@1190,1%@1190,0%@1190,21%@1190,12%@1190,4%@1190,3%@1336] EMC_FREQ 5%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 0% AO@42C GPU@43.5C Tboard@43C Tdiode@45.5C AUX@42.5C CPU@43.5C thermal@43.1C PMIC@100C GPU 0/0 CPU 620/642 SOC 4960/4937 CV 0/0 VDDRQ 620/620 SYS5V 2499/2493
RAM 3869/15692MB (lfb 1623x4MB) CPU [9%@1190,3%@1190,0%@1190,2%@1190,26%@1190,9%@1267,3%@1267,4%@1267] EMC_FREQ 5%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@42C GPU@44C Tboard@43C Tdiode@45.5C AUX@42.5C CPU@44C thermal@43.4C PMIC@100C GPU 0/0 CPU 775/658 SOC 4960/4940 CV 0/0 VDDRQ 620/620 SYS5V 2499/2494
RAM 3869/15692MB (lfb 1623x4MB) CPU [8%@1267,0%@1267,3%@1267,0%@1267,25%@1267,14%@1267,5%@1413,4%@1458] EMC_FREQ 5%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 0% AO@42C GPU@44C Tboard@43C Tdiode@45.5C AUX@42C CPU@44C thermal@43.4C PMIC@100C GPU 0/0 CPU 620/654 SOC 4960/4942 CV 0/0 VDDRQ 620/620 SYS5V 2499/2494
RAM 3872/15692MB (lfb 1622x4MB) CPU [7%@1190,4%@1190,2%@1190,3%@1190,24%@1190,13%@1190,7%@1190,1%@1329] EMC_FREQ 5%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@42.5C GPU@44C Tboard@43C Tdiode@45.5C AUX@42.5C CPU@44C thermal@43.4C PMIC@100C GPU 0/0 CPU 620/651 SOC 4960/4944 CV 0/0 VDDRQ 620/620 SYS5V 2499/2495
RAM 3875/15692MB (lfb 1621x4MB) CPU [7%@1190,3%@1190,2%@1190,2%@1190,17%@1190,19%@1190,2%@1201,8%@1267] EMC_FREQ 6%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 2% AO@42.5C GPU@44C Tboard@43C Tdiode@45.75C AUX@42.5C CPU@44C thermal@43.4C PMIC@100C GPU 310/28 CPU 775/662 SOC 4803/4931 CV 0/0 VDDRQ 620/620 SYS5V 2580/2502

It seems CPU load 28% on blocking mode but CPU load 48% on non-blocking mode. The CPU load is too high when doesn’t write YUVs to a file. Any suggestion to decrease the CPU load ?

Hi,
There is an issue in running the options ‘–stats --blocking-mode 0 --disable-rendering’. You may apply below patch and try again.

@@ -1423,17 +1423,17 @@ check_capture_buffers:
                 {
                     ctx.renderer->render(ctx.dst_dma_fd);
                 }
-                // Queue the buffer back once it has been used.
-                // If we are not rendering, queue the buffer back here immediately.
-                if(ctx.capture_plane_mem_type == V4L2_MEMORY_DMABUF)
-                    v4l2_capture_buf.m.planes[0].m.fd = ctx.dmabuff_fd[v4l2_capture_buf.index];
-                if (ctx.dec->capture_plane.qBuffer(v4l2_capture_buf, NULL) < 0)
-                {
-                    abort(&ctx);
-                    cerr << "Error while queueing buffer at decoder capture plane"
-                            << endl;
-                    break;
-                }
+            }
+            // Queue the buffer back once it has been used.
+            // If we are not rendering, queue the buffer back here immediately.
+            if(ctx.capture_plane_mem_type == V4L2_MEMORY_DMABUF)
+                v4l2_capture_buf.m.planes[0].m.fd = ctx.dmabuff_fd[v4l2_capture_buf.index];
+            if (ctx.dec->capture_plane.qBuffer(v4l2_capture_buf, NULL) < 0)
+            {
+                abort(&ctx);
+                cerr << "Error while queueing buffer at decoder capture plane"
+                        << endl;
+                break;
             }
         }
     }

Besides, ‘–disable-rendering’ decodes frames continuously. For comparing in steady 1080p30, you may run

$ ./video_decode H264 --stats <b>--blocking-mode 0[or 1]</b> ../../data/Video/sample_outdoor_car_1080p_10fps.h264

Hi,
I applied the patch.
I opened 6 terminals and run the command at the same time

./video_decode H264 --stats  --blocking-mode 1 --disable-rendering  00[1-6]

00[1-6] means the different 4K h264 video stream. each video stream has a duration of 180 seconds.

I found the CPU load is 20% in blocking mode but the CPU load is 80% in non-blocking mode.

1.blocking mode

RAM 5722/15692MB (lfb 1287x4MB) CPU [4%@1190,5%@1190,3%@1190,2%@1190,7%@1190,22%@1190,4%@1406,1%@1287] EMC_FREQ 16%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@38.5C GPU@40.5C Tboard@40C Tdiode@41.75C AUX@38.5C CPU@40.5C thermal@39.55C PMIC@100C GPU 309/366 CPU 619/641 SOC 6035/5810 CV 0/0 VDDRQ 1238/1169 SYS5V 2741/2670
RAM 5722/15692MB (lfb 1287x4MB) CPU [5%@1190,1%@1190,0%@1190,1%@1190,3%@1190,17%@1190,0%@1190,2%@1190] EMC_FREQ 16%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@38.5C GPU@39.5C Tboard@40C Tdiode@41.75C AUX@38.5C CPU@40.5C thermal@39.55C PMIC@100C GPU 464/368 CPU 619/641 SOC 6035/5813 CV 0/0 VDDRQ 1238/1170 SYS5V 2741/2671
$ ./video_decode H264 --stats  --blocking-mode 1 --disable-rendering  002 
Set governor to performance before enabling profiler
Creating decoder in blocking mode 
Failed to query video capabilities: Inappropriate ioctl for device
Opening in BLOCKING MODE 
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading sys.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Setting frame input mode to 1 
Starting decoder capture loop thread
Video Resolution: 3840x2160
Decoder colorspace ITU-R BT.709 with extended range luma (0-255)
Query and set capture successful
Input file read complete
----------- Element = dec0 -----------
Total Profiling time = 87.0246
Average FPS = 52.1462
Total units processed = 4539
-------------------------------------
************************************
Total Profiling Time = 0 sec
************************************
Exiting decoder capture loop thread
App run was successful

2.non-blocking mode

RAM 5736/15692MB (lfb 1280x4MB) CPU [47%@2265,46%@2265,83%@2265,84%@2265,56%@2265,57%@2265,90%@2265,91%@2265] EMC_FREQ 17%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@47C GPU@48C Tboard@46C Tdiode@50C AUX@47.5C CPU@53C thermal@50.95C PMIC@100C GPU 462/377 CPU 10167/9531 SOC 6315/5929 CV 0/0 VDDRQ 1232/1149 SYS5V 2822/2706
RAM 5737/15692MB (lfb 1280x4MB) CPU [43%@2265,46%@2265,62%@2265,75%@2265,86%@2265,76%@2265,75%@2265,74%@2265] EMC_FREQ 17%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 2% AO@46.5C GPU@48C Tboard@46C Tdiode@50C AUX@47.5C CPU@53C thermal@49.3C PMIC@100C GPU 462/378 CPU 10171/9539 SOC 6315/5934 CV 0/0 VDDRQ 1232/1150 SYS5V 2822/2708
$ ./video_decode H264 --stats  --blocking-mode 0 --disable-rendering  002 
Set governor to performance before enabling profiler
Creating decoder in non-blocking mode 
Failed to query video capabilities: Inappropriate ioctl for device
Opening in O_NONBLOCKING MODE 
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading sys.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Setting frame input mode to 1 
Created the PollThread and Decoder Thread 
Starting Device Poll Thread 
Got V4L2_EVENT_RESOLUTION_CHANGE EVENT 
Video Resolution: 3840x2160
Decoder colorspace ITU-R BT.709 with extended range luma (0-255)
Query and set capture successful
Input file read complete
Done processing all the buffers returning 
----------- Element = dec0 -----------
Total Profiling time = 87.0611
Average FPS = 52.1243
Total units processed = 4539
-------------------------------------
************************************
Total Profiling Time = 0 sec
************************************
Decoder got eos, exiting poll thread 
App run was successful

Hi,
I applied the patch.
I opened 6 terminals and run the command at the same time

./video_decode H264 --stats  --blocking-mode 1 --disable-rendering  00[1-6]

00[1-6] means the different 4K h264 video stream. each video stream has a duration of 180 seconds.

I found the CPU load is 20% in blocking mode but the CPU load is 80% in non-blocking mode.

1.blocking mode

RAM 5722/15692MB (lfb 1287x4MB) CPU [4%@1190,5%@1190,3%@1190,2%@1190,7%@1190,22%@1190,4%@1406,1%@1287] EMC_FREQ 16%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@38.5C GPU@40.5C Tboard@40C Tdiode@41.75C AUX@38.5C CPU@40.5C thermal@39.55C PMIC@100C GPU 309/366 CPU 619/641 SOC 6035/5810 CV 0/0 VDDRQ 1238/1169 SYS5V 2741/2670
RAM 5722/15692MB (lfb 1287x4MB) CPU [5%@1190,1%@1190,0%@1190,1%@1190,3%@1190,17%@1190,0%@1190,2%@1190] EMC_FREQ 16%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@38.5C GPU@39.5C Tboard@40C Tdiode@41.75C AUX@38.5C CPU@40.5C thermal@39.55C PMIC@100C GPU 464/368 CPU 619/641 SOC 6035/5813 CV 0/0 VDDRQ 1238/1170 SYS5V 2741/2671
$ ./video_decode H264 --stats  --blocking-mode 1 --disable-rendering  002 
Set governor to performance before enabling profiler
Creating decoder in blocking mode 
Failed to query video capabilities: Inappropriate ioctl for device
Opening in BLOCKING MODE 
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading sys.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Setting frame input mode to 1 
Starting decoder capture loop thread
Video Resolution: 3840x2160
Decoder colorspace ITU-R BT.709 with extended range luma (0-255)
Query and set capture successful
Input file read complete
----------- Element = dec0 -----------
Total Profiling time = 87.0246
Average FPS = 52.1462
Total units processed = 4539
-------------------------------------
************************************
Total Profiling Time = 0 sec
************************************
Exiting decoder capture loop thread
App run was successful

2.non-blocking mode

RAM 5736/15692MB (lfb 1280x4MB) CPU [47%@2265,46%@2265,83%@2265,84%@2265,56%@2265,57%@2265,90%@2265,91%@2265] EMC_FREQ 17%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@47C GPU@48C Tboard@46C Tdiode@50C AUX@47.5C CPU@53C thermal@50.95C PMIC@100C GPU 462/377 CPU 10167/9531 SOC 6315/5929 CV 0/0 VDDRQ 1232/1149 SYS5V 2822/2706
RAM 5737/15692MB (lfb 1280x4MB) CPU [43%@2265,46%@2265,62%@2265,75%@2265,86%@2265,76%@2265,75%@2265,74%@2265] EMC_FREQ 17%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 2% AO@46.5C GPU@48C Tboard@46C Tdiode@50C AUX@47.5C CPU@53C thermal@49.3C PMIC@100C GPU 462/378 CPU 10171/9539 SOC 6315/5934 CV 0/0 VDDRQ 1232/1150 SYS5V 2822/2708
$ ./video_decode H264 --stats  --blocking-mode 0 --disable-rendering  002 
Set governor to performance before enabling profiler
Creating decoder in non-blocking mode 
Failed to query video capabilities: Inappropriate ioctl for device
Opening in O_NONBLOCKING MODE 
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading sys.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Setting frame input mode to 1 
Created the PollThread and Decoder Thread 
Starting Device Poll Thread 
Got V4L2_EVENT_RESOLUTION_CHANGE EVENT 
Video Resolution: 3840x2160
Decoder colorspace ITU-R BT.709 with extended range luma (0-255)
Query and set capture successful
Input file read complete
Done processing all the buffers returning 
----------- Element = dec0 -----------
Total Profiling time = 87.0611
Average FPS = 52.1243
Total units processed = 4539
-------------------------------------
************************************
Total Profiling Time = 0 sec
************************************
Decoder got eos, exiting poll thread 
App run was successful

Hi,
I applied the patch.
I opened 6 terminals and run the command at the same time

./video_decode H264 --stats  --blocking-mode 1 --disable-rendering  00[1-6]

00[1-6] means the different 4K h264 video stream. each video stream has a duration of 180 seconds.

I found the CPU load is 20% in blocking mode but the CPU load is 80% in non-blocking mode.

1.blocking mode

RAM 5722/15692MB (lfb 1287x4MB) CPU [4%@1190,5%@1190,3%@1190,2%@1190,7%@1190,22%@1190,4%@1406,1%@1287] EMC_FREQ 16%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@38.5C GPU@40.5C Tboard@40C Tdiode@41.75C AUX@38.5C CPU@40.5C thermal@39.55C PMIC@100C GPU 309/366 CPU 619/641 SOC 6035/5810 CV 0/0 VDDRQ 1238/1169 SYS5V 2741/2670
RAM 5722/15692MB (lfb 1287x4MB) CPU [5%@1190,1%@1190,0%@1190,1%@1190,3%@1190,17%@1190,0%@1190,2%@1190] EMC_FREQ 16%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@38.5C GPU@39.5C Tboard@40C Tdiode@41.75C AUX@38.5C CPU@40.5C thermal@39.55C PMIC@100C GPU 464/368 CPU 619/641 SOC 6035/5813 CV 0/0 VDDRQ 1238/1170 SYS5V 2741/2671
$ ./video_decode H264 --stats  --blocking-mode 1 --disable-rendering  002 
Set governor to performance before enabling profiler
Creating decoder in blocking mode 
Failed to query video capabilities: Inappropriate ioctl for device
Opening in BLOCKING MODE 
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading sys.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Setting frame input mode to 1 
Starting decoder capture loop thread
Video Resolution: 3840x2160
Decoder colorspace ITU-R BT.709 with extended range luma (0-255)
Query and set capture successful
Input file read complete
----------- Element = dec0 -----------
Total Profiling time = 87.0246
Average FPS = 52.1462
Total units processed = 4539
-------------------------------------
************************************
Total Profiling Time = 0 sec
************************************
Exiting decoder capture loop thread
App run was successful

2.non-blocking mode

RAM 5736/15692MB (lfb 1280x4MB) CPU [47%@2265,46%@2265,83%@2265,84%@2265,56%@2265,57%@2265,90%@2265,91%@2265] EMC_FREQ 17%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 1% AO@47C GPU@48C Tboard@46C Tdiode@50C AUX@47.5C CPU@53C thermal@50.95C PMIC@100C GPU 462/377 CPU 10167/9531 SOC 6315/5929 CV 0/0 VDDRQ 1232/1149 SYS5V 2822/2706
RAM 5737/15692MB (lfb 1280x4MB) CPU [43%@2265,46%@2265,62%@2265,75%@2265,86%@2265,76%@2265,75%@2265,74%@2265] EMC_FREQ 17%@2133 GR3D_FREQ 0%@318 NVDEC 1190 NVDEC1 1190 APE 150 MTS fg 0% bg 2% AO@46.5C GPU@48C Tboard@46C Tdiode@50C AUX@47.5C CPU@53C thermal@49.3C PMIC@100C GPU 462/378 CPU 10171/9539 SOC 6315/5934 CV 0/0 VDDRQ 1232/1150 SYS5V 2822/2708
$ ./video_decode H264 --stats  --blocking-mode 0 --disable-rendering  002 
Set governor to performance before enabling profiler
Creating decoder in non-blocking mode 
Failed to query video capabilities: Inappropriate ioctl for device
Opening in O_NONBLOCKING MODE 
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading sys.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Setting frame input mode to 1 
Created the PollThread and Decoder Thread 
Starting Device Poll Thread 
Got V4L2_EVENT_RESOLUTION_CHANGE EVENT 
Video Resolution: 3840x2160
Decoder colorspace ITU-R BT.709 with extended range luma (0-255)
Query and set capture successful
Input file read complete
Done processing all the buffers returning 
----------- Element = dec0 -----------
Total Profiling time = 87.0611
Average FPS = 52.1243
Total units processed = 4539
-------------------------------------
************************************
Total Profiling Time = 0 sec
************************************
Decoder got eos, exiting poll thread 
App run was successful

Hi,
Running in non-blocking mode, it does polling which take CPU loading. The overhead is fine in single video decoding but significant in multiple video decoding case. For multiple video decoding, we suggest you run in blocking mode.

Hi DaneLLL,

OK, I got it. Thanks a lot!