Nvmimg_jpgenc supported yuv format and performace issue

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
[×] DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
[×] Linux
QNX
other

Hardware Platform
[×] NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.5.0.7774
other

Host Machine Version
[×] native Ubuntu 18.04
other

Ref DRIVE_OS_5.2.0_SDK_Linux_OS_DDPX, we can see nvmimg_jpgenc can retrieve *.yuv file as input file.
We have some question below:

  1. What is this supposed *.yuv file format. Assuming it is YUV420. If our original input is YUV422, what is recommended proposal?
  2. We use ./nvmimg_jpgenc to do 1920*1080 YUV420 encoding. The performance result is blow.
$./nvmimg_jpgenc -f Camera_1920x1080_YUV420.yuv -fr 1920x1080 -of Camera_1920x1080_YUV420.jpg -q 75

Total Encoding time for 1 frames: 4.478 ms
Encoding time per frame 4.1430 ms 
Get bits time per frame 0.3350 ms 

Can you do similar testing on your side?

Hi @Peter_Pertrili ,

YUV422 is also supported. But current ParseArgs() in ~/nvidia/nvidia_sdk/DRIVE_OS_5.2.0_SDK_Linux_OS_DDPX/DRIVEOS/drive-t186ref-linux/samples/nvmedia/img_jpgenc/cmdline.c is hard-coded as 420. You may need to modify the code accordingly.

I have a quick try with 1280x1080 yuv as below.

nvidia@tegra-ubuntu:~$ ./nvmimg_jpgenc -f fisheye_1280_1080_yuv420.yuv -fr 1280x1080 -of 1280-1080-420.jpg -q 75

Total Encoding time for 5 frames: 13.354 ms
Encoding time per frame 2.5264 ms
Get bits time per frame 0.1444 ms

Total encoded frames = 5, avg. bitrate=19082544

ENCODING PROCESS ENDED SUCCESSFULY

@VickNV

Can you share us with the jpgenc supported input format list? Which document cover those kinds of info?

Anyway as your suggestion, I have updated cmdline.c to try to support YUV422. The patch part is showing below.

diff --git a/nvmedia/img_jpgenc/cmdline.c b/nvmedia/img_jpgenc/cmdline.c
index 3de3ddc..871ac3f 100644
--- a/nvmedia/img_jpgenc/cmdline.c
+++ b/nvmedia/img_jpgenc/cmdline.c
@@ -20,7 +20,7 @@ void PrintUsage()
     LOG_MSG("Options:\n");
     LOG_MSG("-h                         Prints usage\n");
     LOG_MSG("-f [file]                  Input image file. \n");
-    LOG_MSG("                           Input file should be in YUV420 with UV order\n");
+    LOG_MSG("                           Input file should be in YUV422 with UV order\n");
     LOG_MSG("-fr [WxH]                  Input file resolution\n");
     LOG_MSG("-q  [value]                JPEG Encoder quality (1..100)\n");
     LOG_MSG("-of [file]                 Output JPEG file. \n");
@@ -40,7 +40,7 @@ int ParseArgs(int argc, char **argv, TestArgs *args)
     // Defaults
     args->maxOutputBuffering = 5;
     args->quality = 50;
-    NVM_SURF_FMT_SET_ATTR_YUV(surfFormatAttrs,YUV,420,SEMI_PLANAR,UINT,8,PL);
+    NVM_SURF_FMT_SET_ATTR_YUV(surfFormatAttrs,YUV,422,SEMI_PLANAR,UINT,8,PL);
     args->inputSurfType = NvMediaSurfaceFormatGetType(surfFormatAttrs, NVM_SURF_FMT_ATTR_MAX);
     //init crcoption
     args->crcoption.crcGenMode = NVMEDIA_FALSE;

After build the new version nvmimg_jpgenc and feed it with sample_1920x1080_yuv422p.yuv, we got follow error message. Seems it doesn’t work well.

There are maybe three reasons:

  1. Our own code modification is not completely. Do we miss some parts?
  2. Currently YUV422 to JPEG isn’t supported for DRIVE OS 5.2.0? Will be supported latter DRIVE OS version?
  3. current JPEG hardware engine limitation?
~$ ./nvmimg_jpgenc -f sample_1920x1080_yuv422p.yuv -fr 1920x1080 -of sample_1920x1080_2.jpg -q 75 -v 3
nvmedia: main: NvMediaDeviceCreate
nvmedia: main: Encode start from frame 0, imageSize=3110400
nvmedia: main: NvMediaIJPECreate, 0x420070
nvmedia: main: Reading YUV frame 0 from file sample_1920x1080_yuv422p.yuv to image surface location: 0x41f8c0. (W:1920, H:1080)
nvmedia: main: ReadYUVFrame 0/0 done
nvmedia: main: Encoding frame #0
nvmedia: ERROR: main: NvMediaIJPEFeedFrameQuality failed: 7
nvmedia: main: Destroying device

sample_1920x1080_yuv420p.yuv (3.0 MB) : original YUV420P 1920x1080 input file.

sample_1920x1080_yuv422p.yuv (4.0 MB) : original YUV422P 1920x1080 input file.

: original JPEG 1920x1080 input file.

Just as like you did jpgenc on your side, per frame encoding time cost is somehow too high. We assuming it should be less than 1 ms or even more faster. It doesn’t make sense that per frame encoding cost more than 2ms . Right?

Is there possible to monitor JPEGenc processing hot point? Then we could locate where is the bottleneck.

After checking internally, only 420 is supported. We will improve the documentation on this.

Let me check and update you. Thanks.

@VickNV

Thanks for your reply.
The reason for only YUV420 supported is hardware limitation or software limitation? Do we have plan to support YUV422 in future?

If we want to support YUV422 to JPEG now, do you have any proposal?

It’s a hardware limitation so you have to use 2D API to convert the format. Thanks.

@VickNV
OK. Got it.

About jpgenc performance issue, what is your investigation?

@VickNV

NvSciSync extends the NvMedia imaging components to support synchronization among the imaging components, NVIDIA® CUDA® components, and NvSciSync-based applications

There is a table showing NvSciSync can be applied with those components like 2D/LDC/ICP/ISP/IEP/OFST/VPI/DLA.
This is no JPEG item part. But IEP section also mentioned IEP component support H.264 and JPEG.
The NvMedia Image Encode (IEP) component supports encoding the incoming NvMedia Image (YUV) inputs to H.264 or H.265 or JPEG formats.
Now I am a little confusion.

  1. Does IEP component include JPEG or not?
  2. There is no method for JPEG NvSciSync?

As we can see, IEP component has two version of head file: nvmedia_iep.h for normal usage and nvmedia_iep_nvscisync.h for NvSciSync version. But for JPEG part there is only nvmedia_ijpe.h head file without nvmedia_ijpe_nvscisync.h or similar files.

My concern is how can we provide true hardware acceleration between 2D and JPEG component(with zero copy)?
As you know, in order to support YUV422p to JPEG, we need to chain both 2D and JPEG hardware engine together with NvSci accelerate feature. But seem there is no NvSci feature on JPEG part?

Another question is for echo hardware engine, how many camera or material file can be feed simultaneously. In other word, what is hardware engine max process capability and does NvMedia already support multi camera and material files? Do we have such kinds of example project for the ref?

Please help to create another topic for this. If necessary, you can link it to this topic by mentioning this topic there. Thanks.

About the performance, could you specify your use-case scenario, specific content, resolution, quality factor?

Our use-case command line is shown below.

~$ ./nvmimg_jpgenc -f sample_1920x1080_yuv422p.yuv -fr 1920x1080 -of sample_1920x1080_2.jpg -q 75 -v 3

If you have any question please let me know.

I think you meant 420 one.

$./nvmimg_jpgenc -f Camera_1920x1080_YUV420.yuv -fr 1920x1080 -of Camera_1920x1080_YUV420.jpg -q 75

Yes you are right.

~$ ./nvmimg_jpgenc -f sample_1920x1080_yuv420p.yuv -fr 1920x1080 -of sample_1920x1080.jpg -q 75 

You can feed it with my previous attachment sample_1920x1080_yuv420p.yuv

@VickNV How was your testing result about jpgenc performance issue?

have checked with our teams. The performance in Nvmimg_jpgenc supported yuv format and performace issue - #2 by VickNV is expected.

Encoder one 1920x1080 yuv422p picture cost 3.6ms which is more than 1ms. We still doubt there is some issue. With hardware engine support, we expect its performance can be compared with FPGA method. Is that correct?

~$ ./nvmimg_jpgenc -f sample_1920x1080_yuv420p.yuv -fr 1920x1080 -of sample_1920x1080.jpg -q 75 

Total Encoding time for 1 frames: 4.236 ms
Encoding time per frame 3.6580 ms 
Get bits time per frame 0.5780 ms 

Total encoded frames = 1, avg. bitrate=61435920

***ENCODING PROCESS ENDED SUCCESSFULY***

Can you please clarify on this FPGA solution. Which chip/platform is being referred here? Thanks.

That is depended on FPGA chips work frequency*bandwidth and the size of reside soft core area.
How may JPEG encoder Input bandwidth for AGX (1.2G * 1 byte)?
If feed with 1920x1080 yuv422p, we assue the cost will be (1920x1020x2)/1.2=3456000ns=3.45ms. Is this the theoretical processing cost?

I need to change my previous opinion about FPGA timing cost.
For FPGA side, there is one inside MIPI core which can retrieve YUV stream date in real-time. And the same time feed this stream data into the next encoder phase. Therefore when YUV input stream is complete, the FPGA side can output encoded JPGE stream simultaneously. From this perspective, the encoder of FPGA cost is only match with MIPI signal transmit cost.

In other word, if start point of measurement is one picture is already received complete, AGX is 3.9ms to do JPEG encoder. But for FPGA is even close to zero since FPGA encoder output is accompanied by input MIPI transfer complete.

  1. AGX is MIPI transfer latency + 3.9ms process timing
  2. FPGA is about MIPI transfer latency.

Anyway, the above content is base one our own hypothesis of AGX. Can you share some detail about how AGX do JPEG endoder inside. Like block diagram or signal flow? I believe that can help us to understand AGX encoder time cost.