Performance issue while 14_multivideo_decode + output change to RGB

Hi Sir:

 I follow this https://forums.developer.nvidia.com/t/tx2-decide-h264-with-tegra-multimedia-api/121421/32
to get RGB frame from hw codec, but while I run 2 files input, the fps less than 24fps.  could we get it fast?

thank you

Hi,
Do you save the YUVs to a file? The performance should be enough if you convert the video frame from NV12 to RGBA through NvBufferTransform(). Since size of 4K RGBA is 3840x2160x4 bytes, it will take some time to save to disk.

Hi Dane:

the YUVs is okay, but while I use NvBufferTransform() to save RGBA 4k resolution, the performance is not good enough to 24fps. do you have any idea to speed up?

thank you

Hi,
Looks like you run

4K stream -> decode to YUV -> convert to RGBA via NvBufferTransform() -> save to a video file

Please remove save to a video file and check again.

Hi Dane:

already removed, but it still the same while I run more than 1 file decode and convert to RGBA. the fps is lower than 24fps for each. Do you have any idea to improve it ?

thank you

Hi,
We don’t observe the issue in running 14_multivideo_decode. The log is

14_multivideo_decode$ ./multivideo_decode num_files 2 /home/nvidia/4k.h264 H264 -o ~/a1.yuv /home/nvidia/4k.h264 H264 -o ~/a2.yuv --disable-rendering --stats                  Set governor to performance before enabling profiler
Creating decoder in blocking mode
Creating decoder in blocking mode
Opening in BLOCKING MODE
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261
Opening in BLOCKING MODE
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
NvMMLiteBlockCreate : Block : BlockType = 261
Setting frame input mode to 1
Setting frame input mode to 1
Starting decoder capture loop thread
Starting decoder capture loop thread
Video Resolution: 3840x2160
Decoder colorspace ITU-R BT.709 with standard range luma (16-235)
Video Resolution: 3840x2160
Decoder colorspace ITU-R BT.709 with standard range luma (16-235)
Query and set capture successful
Query and set capture successful
Input file read complete
Input file read complete
Exiting decoder capture loop thread
Instance 0 executed sucessfully.
Exiting decoder capture loop thread
Instance 1 executed sucessfully.
*****************************************
Stream = /home/nvidia/4k.h264
Total Profiling time = 16.4949
Average FPS = 54.5017
Average latency(usec) = 0
Minimum latency(usec) = 18446744073709551615
Maximum latency(usec) = 0
*****************************************
*****************************************
Stream = /home/nvidia/4k.h264
Total Profiling time = 16.4675
Average FPS = 54.5923
Average latency(usec) = 0
Minimum latency(usec) = 18446744073709551615
Maximum latency(usec) = 0
*****************************************
App run was successful

The patch of converting decoded NV12 to RGBA:

diff --git a/multimedia_api/ll_samples/samples/14_multivideo_decode/multivideo_decode_main.cpp b/multimedia_api/ll_samples/samples/14_multivideo_decode/multivideo_decode_main.cpp
index ebc4095..24f15ba 100644
--- a/multimedia_api/ll_samples/samples/14_multivideo_decode/multivideo_decode_main.cpp
+++ b/multimedia_api/ll_samples/samples/14_multivideo_decode/multivideo_decode_main.cpp
@@ -600,8 +600,7 @@ query_and_set_capture(context_t * ctx)
     input_params.width = crop.c.width;
     input_params.height = crop.c.height;
     input_params.layout = NvBufferLayout_Pitch;
-    input_params.colorFormat = ctx->out_pixfmt == 1 ? NvBufferColorFormat_NV12 :
-                                            NvBufferColorFormat_YUV420;
+    input_params.colorFormat = NvBufferColorFormat_ABGR32;
     input_params.nvbuf_tag = NvBufferTag_VIDEO_DEC;
 
     ret = NvBufferCreateEx (&ctx->dst_dma_fd, &input_params);
@@ -1069,7 +1068,7 @@ dec_capture_loop_fcn(void *arg)
              /* If we need to write to file or display the buffer, give
                the buffer to video converter output plane instead of
                returning the buffer back to decoder capture plane. */
-            if (ctx->out_file || (!ctx->disable_rendering && !ctx->stats))
+            if (1)
             {
 #ifndef USE_NVBUF_TRANSFORM_API
                 NvBuffer *conv_buffer;

Hi Dane:
I got the same result while only 4k resolution x2, but while add to 4k resolution x4, slower than 24 fps per channel. so the different between is 2 or 4 channels and also blocking mode or non-blocking mode. Do you have any idea to improve the performance on 4k x 4channel, or it’s the limitation on this platform?

Hi,
It is hardware limitation of TX2. 4x 4Kp30 is not supported.

Hi Dane:

how about NX? Is it powerful than TX on decoder part?

thank you

Hi,
For Xavier NX, please check
https://developer.nvidia.com/jetson-xavier-nx-data-sheet
It can run 4x 4Kp30 in HEVC decoding.