Jetson nano developer kit nvjpeg encoder's speed so slowly. it encode YUV420_7264X4112_Pic 100 times need 20s

i use cuda 10.2 L4T Multimedia API Reference sample 05_jpeg_encode, and i don’t calculation the time that copy buffer.

Hi,
We have tried to encode 7264x4112 on Jetson Nano. Here is the test result:

05_jpeg_encode$ gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=7264,height=4112 ! filesink location= /home/nvidia/a.yuv
05_jpeg_encode$ time sudo ./jpeg_encode /home/nvidia/a.yuv 7264 4112 /home/nvidia/a.jpg
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (1) OUTPUT PLANE BUFFERS Layout=0
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (1) CAPTURE PLANE BUFFERS Layout=1
App run was successful

real    0m0.304s
user    0m0.032s
sys     0m0.204s

FYR.

so 0.3X100 = 30s

bc2ef62eca09dc478af63664aad3f971
why the datasheet pdf write encode param is this?

Hi,
We verify it with gstreamer commands:

$ gst-launch-1.0 videotestsrc num-buffers=30 ! 'video/x-raw, width=(int)1280, height=(int)720, format=(string)I420, framerate=(fraction)30/1' ! filesink location=720p.yuv
$ gst-launch-1.0 filesrc location=720p.yuv ! videoparse width=1280 height=720 format=2 framerate=30 ! nvvidconv ! 'video/x-raw(memory:NVMM), width=(int)5952, height=(int)3348, format=(string)I420, framerate=(fraction)30/1' ! nvjpegenc ! filesink location=enc.jpg
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:00.973442088
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

It take around 1 second(0:00:00.973442088). (heightxWidthxFPS) = 5952x3448x30 = ~600MP/sec.
Running 05_jpeg_encode is to encode single frame. You may modify it to continuous encoding for a try.

hi,
why don’t you use 5852x3448 yuv for src?

Hi,
The partial pipeline is

... ! video/x-raw,width=1280,height=720 ! nvvidconv ! video/x-raw(memory:NVMM),width=5952,height=3448 ! ...

It is to copy 1280x720 CPU buffer to NVMM buffer, and then upscale to 5952x3448. We think the memory copy is neglectable and has very less impact to throughput.

If we run

... ! video/x-raw,width=5952,height=3448 ! nvvidconv ! video/x-raw(memory:NVMM),width=5952,height=3448 ! ...

It is to copy 5952x3448 CPU buffer to NVMM buffer and makes significant impact to throughput. The result is dominated by this operation and cannot see real JPEG encoding throughput.

In other words, the real JPEG performance of nvidia nano in the application is only about 100Mpixel.

Hi,

It is correct if you have video frames in CPU buffer. File IO dominates the throughput in this case. If your source is nvarguscamerasrc or nvv4l2camerasrc, video frames are in NVMM buffer and can get better performance.

Hi,
i run 09_camera_jpeg_capture and in CaptureThread this api “acquireFrame” spend a lot of time, how can i
optimization it.

    IFrame *iFrame = interface_cast<IFrame>(frame);
    if (!iFrame)
        break;

gettimeofday(&t2, NULL);

    // Get the IImageNativeBuffer extension interface.
    NV::IImageNativeBuffer *iNativeBuffer =
        interface_cast<NV::IImageNativeBuffer>(iFrame->getImage());
    if (!iNativeBuffer)
        ORIGINATE_ERROR("IImageNativeBuffer not supported by Image.");

    // If we don't already have a buffer, create one from this image.
    // Otherwise, just blit to our buffer.
    if (m_dmabuf == -1)
    {
        m_dmabuf = iNativeBuffer->createNvBuffer(iEglOutputStream->getResolution(),
                                                 NvBufferColorFormat_YUV420,
                                                 NvBufferLayout_BlockLinear);
        if (m_dmabuf == -1)
            CONSUMER_PRINT("\tFailed to create NvBuffer\n");
    }
    else if (iNativeBuffer->copyToNvBuffer(m_dmabuf) != STATUS_OK)
    {
        ORIGINATE_ERROR("Failed to copy frame to NvBuffer.");
    }

	gettimeofday(&t3, NULL);

    // Process frame.
    processV4L2Fd(m_dmabuf, iFrame->getNumber());
	
	gettimeofday(&t4, NULL);

i printf some debug msg. v4l2 out format is V4L2_PIX_FMT_SRGGB10. which api can
Transform it to YUV420. and i test v4l2 out format is not support YUV420. the output buffer data all is ‘\0’。
i use jetpack 4.4.1 sensor is imx219.

Hi,
If you see acquireFrame() is slow, it seems to be an issue in the source, not JPEG encoding. Is your camera Raspberry Pi camera v2? Pi camera V2 is imx219.

imx219.c in kernel path is ~\Linux_for_Tegra\source\public\kernel\nvidia\drivers\media\i2c. and i don’t edit the device tree .

then i know the is not spend most of time JPEG encoding. but how can i get source buffer from camera more faster.

Hi,
You may check if you get the desired fps in running the command:

$ gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=_CAMERA_WIDTH_,hieght=_CAMERA_HEIGHT_' ! fpsdisplaysink text-overlay=0 video-sink=nvoverlaysink -v

If the source is 30fps, acquireFrame() should return frames every 33ms.

GST_ARGUS: Running with following settings:
Camera index = 0
Camera mode = 0
Output Stream W = 3264 H = 2464
seconds to Run = 0
Frame Rate = 21.000000
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstNvOverlaySink-nvoverlaysink:nvoverlaysink-nvoverlaysink0: sync = true
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 12, dropped: 0, current: 23.09, average: 23.09
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 22, dropped: 0, current: 19.92, average: 21.54
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 32, dropped: 0, current: 19.95, average: 21.01
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 43, dropped: 0, current: 19.97, average: 20.74
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 54, dropped: 0, current: 20.08, average: 20.60
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 65, dropped: 0, current: 19.96, average: 20.49
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 76, dropped: 0, current: 19.95, average: 20.41
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 86, dropped: 0, current: 19.92, average: 20.35
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 97, dropped: 0, current: 20.15, average: 20.33
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 107, dropped: 0, current: 19.97, average: 20.29
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 117, dropped: 0, current: 19.59, average: 20.23
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 128, dropped: 0, current: 20.34, average: 20.24
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 139, dropped: 0, current: 20.09, average: 20.23
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 150, dropped: 0, current: 20.00, average: 20.21
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 161, dropped: 0, current: 19.97, average: 20.20
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 172, dropped: 0, current: 20.04, average: 20.19
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 182, dropped: 0, current: 19.96, average: 20.17
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 193, dropped: 0, current: 19.99, average: 20.16

bool CaptureConsumerThread::processV4L2Fd(int32_t fd, uint64_t frameNumber)
{
//char filename[FILENAME_MAX];
//sprintf(filename, “output%03u.jpg”, (unsigned) frameNumber);

//std::ofstream *outputFile = new std::ofstream(filename);
//if (outputFile)
{
    unsigned long size = JPEG_BUFFER_SIZE;
    unsigned char *buffer = m_OutputBuffer;
    m_JpegEncoder->encodeFromFd(fd, JCS_YCbCr, &buffer, size);
  //  outputFile->write((char *)buffer, size);
  //  delete outputFile;
}

return true;

}

bool ConsumerThread::threadExecute()
{
IEGLOutputStream *iEglOutputStream = interface_cast(m_stream);
IFrameConsumer *iFrameConsumer = interface_cast(m_consumer);
long long int spendTime1,spendTime2, spendTime3;

// Wait until the producer has connected to the stream.
CONSUMER_PRINT("Waiting until producer is connected...\n");
if (iEglOutputStream->waitUntilConnected() != STATUS_OK)
    ORIGINATE_ERROR("Stream failed to connect.");
CONSUMER_PRINT("Producer has connected; continuing.\n");
struct timeval t1, t2, t3, t4;

while (true)
{
	
	gettimeofday(&t1, NULL);
    // Acquire a frame.
    UniqueObj<Frame> frame(iFrameConsumer->acquireFrame());
    IFrame *iFrame = interface_cast<IFrame>(frame);
    if (!iFrame)
        break;

	gettimeofday(&t2, NULL);

    // Get the IImageNativeBuffer extension interface.
    NV::IImageNativeBuffer *iNativeBuffer =
        interface_cast<NV::IImageNativeBuffer>(iFrame->getImage());
    if (!iNativeBuffer)
        ORIGINATE_ERROR("IImageNativeBuffer not supported by Image.");

    // If we don't already have a buffer, create one from this image.
    // Otherwise, just blit to our buffer.
    if (m_dmabuf == -1)
    {
        m_dmabuf = iNativeBuffer->createNvBuffer(iEglOutputStream->getResolution(),
                                                 NvBufferColorFormat_YUV420,
                                                 NvBufferLayout_BlockLinear);
        if (m_dmabuf == -1)
            CONSUMER_PRINT("\tFailed to create NvBuffer\n");
    }
    else if (iNativeBuffer->copyToNvBuffer(m_dmabuf) != STATUS_OK)
    {
        ORIGINATE_ERROR("Failed to copy frame to NvBuffer.");
    }

	gettimeofday(&t3, NULL);

    // Process frame.
    processV4L2Fd(m_dmabuf, iFrame->getNumber());
	
	gettimeofday(&t4, NULL);

	spendTime1 = t2.tv_usec + t2.tv_sec*1000000 - t1.tv_usec  -t1.tv_sec*1000000;
	spendTime2 = t3.tv_usec + t3.tv_sec*1000000 - t2.tv_usec  -t2.tv_sec*1000000;
	spendTime3 = t4.tv_usec + t4.tv_sec*1000000 - t3.tv_usec  -t3.tv_sec*1000000;

    printf("num:%lu t1:%lld t2:%lld t3:%lld\n", iFrame->getNumber(), spendTime1, spendTime2, spendTime3);
}

CONSUMER_PRINT("Done.\n");

requestShutdown();

return true;

}

[INFO] (NvEglRenderer.cpp:110) Setting Screen width 640 height 480
PRODUCER: Creating output stream
PRODUCER: Launching consumer thread
CONSUMER: Waiting until producer is connected…
CONSUMER: Waiting until producer is connected…
PRODUCER: Available Sensor modes :
PRODUCER: [0] W=3264 H=2464
PRODUCER: [1] W=3264 H=1848
PRODUCER: [2] W=1920 H=1080
PRODUCER: [3] W=1280 H=720
PRODUCER: [4] W=1280 H=720
PRODUCER: Requested FPS out of range. Fall back to 30
PRODUCER: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
CONSUMER: Producer has connected; continuing.
num:1 t1:1202117 t2:5479 t3:17097
num:1 t1:1197969 t2:6806 t3:20621
num:2 t1:466622 t2:1468 t3:3108
num:2 t1:471189 t2:1243 t3:10100
num:3 t1:505322 t2:4040 t3:5790
num:3 t1:505014 t2:3230 t3:8196
num:4 t1:470405 t2:1241 t3:2960
CONSUMER: Done.
num:4 t1:470096 t2:1653 t3:12000
CONSUMER: Done.
PRODUCER: Done – exiting.

hello, do you have any suggest for me? waiting for your reply.

Hi,
We try with Pi camera V2 and the result looks fine:

nvidia@nvidia-desktop:/usr/src/jetson_multimedia_api/samples/09_camera_jpeg_capture$ ./camera_jpeg_capture
[INFO] (NvEglRenderer.cpp:110) <renderer0> Setting Screen width 640 height 480
PRODUCER: Creating output stream
PRODUCER: Launching consumer thread
CONSUMER: Waiting until producer is connected...
CONSUMER: Waiting until producer is connected...
PRODUCER: Available Sensor modes :
PRODUCER: [0] W=3264 H=2464
PRODUCER: [1] W=3264 H=1848
PRODUCER: [2] W=1920 H=1080
PRODUCER: [3] W=1280 H=720
PRODUCER: [4] W=1280 H=720
PRODUCER: Requested FPS out of range. Fall back to 30
PRODUCER: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
CONSUMER: Producer has connected; continuing.
num:1 t1:284263 t2:4950 t3:9474
num:1 t1:284587 t2:2526 t3:14821
num:2 t1:28821 t2:1029 t3:3156
num:2 t1:29480 t2:1037 t3:2963
num:3 t1:43245 t2:1050 t3:2986
num:3 t1:43113 t2:1074 t3:5807
num:4 t1:38923 t2:1178 t3:2936
num:4 t1:38823 t2:1285 t3:9885
num:5 t1:36483 t2:1246 t3:2929
num:5 t1:36377 t2:763 t3:12823
num:6 t1:34391 t2:998 t3:2949
num:6 t1:33853 t2:819 t3:15322
num:7 t1:31578 t2:1247 t3:2951
num:7 t1:31158 t2:971 t3:17763
num:8 t1:26271 t2:1997 t3:2937
num:8 t1:25868 t2:1883 t3:5544
num:9 t1:40454 t2:2045 t3:2976
num:9 t1:40128 t2:1884 t3:8010
num:10 t1:38087 t2:2017 t3:3001
num:10 t1:37693 t2:2295 t3:9931
num:11 t1:35880 t2:2034 t3:2958
num:11 t1:35575 t2:1771 t3:12654
num:12 t1:33185 t2:2142 t3:2960
num:12 t1:32873 t2:1942 t3:15154
num:13 t1:31200 t2:1868 t3:2986
num:13 t1:30759 t2:1713 t3:17538
num:14 t1:28569 t2:1885 t3:2747
num:14 t1:28519 t2:2029 t3:3099
num:15 t1:42706 t2:1695 t3:2952
num:15 t1:42424 t2:1829 t3:5753
num:16 t1:41186 t2:1245 t3:2959
num:16 t1:40115 t2:1702 t3:8254
num:17 t1:37750 t2:2131 t3:2980
num:17 t1:37727 t2:2141 t3:10002
num:18 t1:35480 t2:2111 t3:2963
num:18 t1:35352 t2:2292 t3:12337
num:19 t1:32982 t2:2325 t3:2956
num:19 t1:32868 t2:1772 t3:15311
num:20 t1:30661 t2:2334 t3:2962
num:20 t1:30618 t2:1755 t3:17631
num:21 t1:27969 t2:2377 t3:2890
num:21 t1:28151 t2:2251 t3:2980
CONSUMER: Done.
CONSUMER: Done.
PRODUCER: Done -- exiting.

Which release version do you use? r32.3.1 or r32.4.4?

Tegra_Multimedia_API_R32.2.3_aarch64.tbz2