Jetson nano developer kit nvjpeg encoder's speed so slowly. it encode YUV420_7264X4112_Pic 100 times need 20s

y2zwei · November 10, 2020, 2:33am

i use cuda 10.2 L4T Multimedia API Reference sample 05_jpeg_encode, and i don’t calculation the time that copy buffer.

DaneLLL · November 10, 2020, 5:57am

Hi,
We have tried to encode 7264x4112 on Jetson Nano. Here is the test result:

05_jpeg_encode$ gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=7264,height=4112 ! filesink location= /home/nvidia/a.yuv
05_jpeg_encode$ time sudo ./jpeg_encode /home/nvidia/a.yuv 7264 4112 /home/nvidia/a.jpg
libv4l2_nvvidconv (0):(802) (INFO) : Allocating (1) OUTPUT PLANE BUFFERS Layout=0
libv4l2_nvvidconv (0):(818) (INFO) : Allocating (1) CAPTURE PLANE BUFFERS Layout=1
App run was successful

real    0m0.304s
user    0m0.032s
sys     0m0.204s

FYR.

y2zwei · November 10, 2020, 6:55am

so 0.3X100 = 30s

y2zwei · November 10, 2020, 7:10am

why the datasheet pdf write encode param is this?

DaneLLL · November 10, 2020, 7:46am

Hi,
We verify it with gstreamer commands:

$ gst-launch-1.0 videotestsrc num-buffers=30 ! 'video/x-raw, width=(int)1280, height=(int)720, format=(string)I420, framerate=(fraction)30/1' ! filesink location=720p.yuv
$ gst-launch-1.0 filesrc location=720p.yuv ! videoparse width=1280 height=720 format=2 framerate=30 ! nvvidconv ! 'video/x-raw(memory:NVMM), width=(int)5952, height=(int)3348, format=(string)I420, framerate=(fraction)30/1' ! nvjpegenc ! filesink location=enc.jpg
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:00.973442088
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

It take around 1 second(0:00:00.973442088). (heightxWidthxFPS) = 5952x3448x30 = ~600MP/sec.
Running 05_jpeg_encode is to encode single frame. You may modify it to continuous encoding for a try.

y2zwei · November 10, 2020, 8:22am

hi,
why don’t you use 5852x3448 yuv for src?

DaneLLL · November 10, 2020, 9:07am

Hi,
The partial pipeline is

... ! video/x-raw,width=1280,height=720 ! nvvidconv ! video/x-raw(memory:NVMM),width=5952,height=3448 ! ...

It is to copy 1280x720 CPU buffer to NVMM buffer, and then upscale to 5952x3448. We think the memory copy is neglectable and has very less impact to throughput.

If we run

... ! video/x-raw,width=5952,height=3448 ! nvvidconv ! video/x-raw(memory:NVMM),width=5952,height=3448 ! ...

It is to copy 5952x3448 CPU buffer to NVMM buffer and makes significant impact to throughput. The result is dominated by this operation and cannot see real JPEG encoding throughput.

hteng_2015 · November 11, 2020, 1:27am

In other words, the real JPEG performance of nvidia nano in the application is only about 100Mpixel.

DaneLLL · November 11, 2020, 2:19am

Hi,

It is correct if you have video frames in CPU buffer. File IO dominates the throughput in this case. If your source is nvarguscamerasrc or nvv4l2camerasrc, video frames are in NVMM buffer and can get better performance.

y2zwei · November 17, 2020, 1:08am

Hi,
i run 09_camera_jpeg_capture and in CaptureThread this api “acquireFrame” spend a lot of time, how can i
optimization it.

    IFrame *iFrame = interface_cast<IFrame>(frame);
    if (!iFrame)
        break;

gettimeofday(&t2, NULL);

    // Get the IImageNativeBuffer extension interface.
    NV::IImageNativeBuffer *iNativeBuffer =
        interface_cast<NV::IImageNativeBuffer>(iFrame->getImage());
    if (!iNativeBuffer)
        ORIGINATE_ERROR("IImageNativeBuffer not supported by Image.");

    // If we don't already have a buffer, create one from this image.
    // Otherwise, just blit to our buffer.
    if (m_dmabuf == -1)
    {
        m_dmabuf = iNativeBuffer->createNvBuffer(iEglOutputStream->getResolution(),
                                                 NvBufferColorFormat_YUV420,
                                                 NvBufferLayout_BlockLinear);
        if (m_dmabuf == -1)
            CONSUMER_PRINT("\tFailed to create NvBuffer\n");
    }
    else if (iNativeBuffer->copyToNvBuffer(m_dmabuf) != STATUS_OK)
    {
        ORIGINATE_ERROR("Failed to copy frame to NvBuffer.");
    }

	gettimeofday(&t3, NULL);

    // Process frame.
    processV4L2Fd(m_dmabuf, iFrame->getNumber());
	
	gettimeofday(&t4, NULL);

y2zwei · November 17, 2020, 1:26am

i printf some debug msg. v4l2 out format is V4L2_PIX_FMT_SRGGB10. which api can
Transform it to YUV420. and i test v4l2 out format is not support YUV420. the output buffer data all is ‘\0’。
i use jetpack 4.4.1 sensor is imx219.

DaneLLL · November 17, 2020, 2:00am

Hi,
If you see acquireFrame() is slow, it seems to be an issue in the source, not JPEG encoding. Is your camera Raspberry Pi camera v2? Pi camera V2 is imx219.

y2zwei · November 17, 2020, 2:11am

imx219.c in kernel path is ~\Linux_for_Tegra\source\public\kernel\nvidia\drivers\media\i2c. and i don’t edit the device tree .

y2zwei · November 17, 2020, 2:21am

then i know the is not spend most of time JPEG encoding. but how can i get source buffer from camera more faster.

DaneLLL · November 17, 2020, 3:14am

Hi,
You may check if you get the desired fps in running the command:

$ gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=_CAMERA_WIDTH_,hieght=_CAMERA_HEIGHT_' ! fpsdisplaysink text-overlay=0 video-sink=nvoverlaysink -v

If the source is 30fps, acquireFrame() should return frames every 33ms.

y2zwei · November 17, 2020, 3:43am

GST_ARGUS: Running with following settings:
Camera index = 0
Camera mode = 0
Output Stream W = 3264 H = 2464
seconds to Run = 0
Frame Rate = 21.000000
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstNvOverlaySink-nvoverlaysink:nvoverlaysink-nvoverlaysink0: sync = true
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 12, dropped: 0, current: 23.09, average: 23.09
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 22, dropped: 0, current: 19.92, average: 21.54
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 32, dropped: 0, current: 19.95, average: 21.01
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 43, dropped: 0, current: 19.97, average: 20.74
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 54, dropped: 0, current: 20.08, average: 20.60
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 65, dropped: 0, current: 19.96, average: 20.49
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 76, dropped: 0, current: 19.95, average: 20.41
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 86, dropped: 0, current: 19.92, average: 20.35
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 97, dropped: 0, current: 20.15, average: 20.33
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 107, dropped: 0, current: 19.97, average: 20.29
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 117, dropped: 0, current: 19.59, average: 20.23
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 128, dropped: 0, current: 20.34, average: 20.24
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 139, dropped: 0, current: 20.09, average: 20.23
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 150, dropped: 0, current: 20.00, average: 20.21
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 161, dropped: 0, current: 19.97, average: 20.20
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 172, dropped: 0, current: 20.04, average: 20.19
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 182, dropped: 0, current: 19.96, average: 20.17
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 193, dropped: 0, current: 19.99, average: 20.16

y2zwei · November 17, 2020, 3:51am

bool CaptureConsumerThread::processV4L2Fd(int32_t fd, uint64_t frameNumber)
{
//char filename[FILENAME_MAX];
//sprintf(filename, “output%03u.jpg”, (unsigned) frameNumber);

//std::ofstream *outputFile = new std::ofstream(filename);
//if (outputFile)
{
    unsigned long size = JPEG_BUFFER_SIZE;
    unsigned char *buffer = m_OutputBuffer;
    m_JpegEncoder->encodeFromFd(fd, JCS_YCbCr, &buffer, size);
  //  outputFile->write((char *)buffer, size);
  //  delete outputFile;
}

return true;

}

bool ConsumerThread::threadExecute()
{
IEGLOutputStream *iEglOutputStream = interface_cast(m_stream);
IFrameConsumer *iFrameConsumer = interface_cast(m_consumer);
long long int spendTime1,spendTime2, spendTime3;

// Wait until the producer has connected to the stream.
CONSUMER_PRINT("Waiting until producer is connected...\n");
if (iEglOutputStream->waitUntilConnected() != STATUS_OK)
    ORIGINATE_ERROR("Stream failed to connect.");
CONSUMER_PRINT("Producer has connected; continuing.\n");
struct timeval t1, t2, t3, t4;

while (true)
{
	
	gettimeofday(&t1, NULL);
    // Acquire a frame.
    UniqueObj<Frame> frame(iFrameConsumer->acquireFrame());
    IFrame *iFrame = interface_cast<IFrame>(frame);
    if (!iFrame)
        break;

	gettimeofday(&t2, NULL);

    // Get the IImageNativeBuffer extension interface.
    NV::IImageNativeBuffer *iNativeBuffer =
        interface_cast<NV::IImageNativeBuffer>(iFrame->getImage());
    if (!iNativeBuffer)
        ORIGINATE_ERROR("IImageNativeBuffer not supported by Image.");

    // If we don't already have a buffer, create one from this image.
    // Otherwise, just blit to our buffer.
    if (m_dmabuf == -1)
    {
        m_dmabuf = iNativeBuffer->createNvBuffer(iEglOutputStream->getResolution(),
                                                 NvBufferColorFormat_YUV420,
                                                 NvBufferLayout_BlockLinear);
        if (m_dmabuf == -1)
            CONSUMER_PRINT("\tFailed to create NvBuffer\n");
    }
    else if (iNativeBuffer->copyToNvBuffer(m_dmabuf) != STATUS_OK)
    {
        ORIGINATE_ERROR("Failed to copy frame to NvBuffer.");
    }

	gettimeofday(&t3, NULL);

    // Process frame.
    processV4L2Fd(m_dmabuf, iFrame->getNumber());
	
	gettimeofday(&t4, NULL);

	spendTime1 = t2.tv_usec + t2.tv_sec*1000000 - t1.tv_usec  -t1.tv_sec*1000000;
	spendTime2 = t3.tv_usec + t3.tv_sec*1000000 - t2.tv_usec  -t2.tv_sec*1000000;
	spendTime3 = t4.tv_usec + t4.tv_sec*1000000 - t3.tv_usec  -t3.tv_sec*1000000;

    printf("num:%lu t1:%lld t2:%lld t3:%lld\n", iFrame->getNumber(), spendTime1, spendTime2, spendTime3);
}

CONSUMER_PRINT("Done.\n");

requestShutdown();

return true;

}

[INFO] (NvEglRenderer.cpp:110) Setting Screen width 640 height 480
PRODUCER: Creating output stream
PRODUCER: Launching consumer thread
CONSUMER: Waiting until producer is connected…
CONSUMER: Waiting until producer is connected…
PRODUCER: Available Sensor modes :
PRODUCER: [0] W=3264 H=2464
PRODUCER: [1] W=3264 H=1848
PRODUCER: [2] W=1920 H=1080
PRODUCER: [3] W=1280 H=720
PRODUCER: [4] W=1280 H=720
PRODUCER: Requested FPS out of range. Fall back to 30
PRODUCER: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
CONSUMER: Producer has connected; continuing.
num:1 t1:1202117 t2:5479 t3:17097
num:1 t1:1197969 t2:6806 t3:20621
num:2 t1:466622 t2:1468 t3:3108
num:2 t1:471189 t2:1243 t3:10100
num:3 t1:505322 t2:4040 t3:5790
num:3 t1:505014 t2:3230 t3:8196
num:4 t1:470405 t2:1241 t3:2960
CONSUMER: Done.
num:4 t1:470096 t2:1653 t3:12000
CONSUMER: Done.
PRODUCER: Done – exiting.

y2zwei · November 18, 2020, 1:44am

hello, do you have any suggest for me? waiting for your reply.

DaneLLL · November 18, 2020, 3:15am

Hi,
We try with Pi camera V2 and the result looks fine:

nvidia@nvidia-desktop:/usr/src/jetson_multimedia_api/samples/09_camera_jpeg_capture$ ./camera_jpeg_capture
[INFO] (NvEglRenderer.cpp:110) <renderer0> Setting Screen width 640 height 480
PRODUCER: Creating output stream
PRODUCER: Launching consumer thread
CONSUMER: Waiting until producer is connected...
CONSUMER: Waiting until producer is connected...
PRODUCER: Available Sensor modes :
PRODUCER: [0] W=3264 H=2464
PRODUCER: [1] W=3264 H=1848
PRODUCER: [2] W=1920 H=1080
PRODUCER: [3] W=1280 H=720
PRODUCER: [4] W=1280 H=720
PRODUCER: Requested FPS out of range. Fall back to 30
PRODUCER: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
CONSUMER: Producer has connected; continuing.
num:1 t1:284263 t2:4950 t3:9474
num:1 t1:284587 t2:2526 t3:14821
num:2 t1:28821 t2:1029 t3:3156
num:2 t1:29480 t2:1037 t3:2963
num:3 t1:43245 t2:1050 t3:2986
num:3 t1:43113 t2:1074 t3:5807
num:4 t1:38923 t2:1178 t3:2936
num:4 t1:38823 t2:1285 t3:9885
num:5 t1:36483 t2:1246 t3:2929
num:5 t1:36377 t2:763 t3:12823
num:6 t1:34391 t2:998 t3:2949
num:6 t1:33853 t2:819 t3:15322
num:7 t1:31578 t2:1247 t3:2951
num:7 t1:31158 t2:971 t3:17763
num:8 t1:26271 t2:1997 t3:2937
num:8 t1:25868 t2:1883 t3:5544
num:9 t1:40454 t2:2045 t3:2976
num:9 t1:40128 t2:1884 t3:8010
num:10 t1:38087 t2:2017 t3:3001
num:10 t1:37693 t2:2295 t3:9931
num:11 t1:35880 t2:2034 t3:2958
num:11 t1:35575 t2:1771 t3:12654
num:12 t1:33185 t2:2142 t3:2960
num:12 t1:32873 t2:1942 t3:15154
num:13 t1:31200 t2:1868 t3:2986
num:13 t1:30759 t2:1713 t3:17538
num:14 t1:28569 t2:1885 t3:2747
num:14 t1:28519 t2:2029 t3:3099
num:15 t1:42706 t2:1695 t3:2952
num:15 t1:42424 t2:1829 t3:5753
num:16 t1:41186 t2:1245 t3:2959
num:16 t1:40115 t2:1702 t3:8254
num:17 t1:37750 t2:2131 t3:2980
num:17 t1:37727 t2:2141 t3:10002
num:18 t1:35480 t2:2111 t3:2963
num:18 t1:35352 t2:2292 t3:12337
num:19 t1:32982 t2:2325 t3:2956
num:19 t1:32868 t2:1772 t3:15311
num:20 t1:30661 t2:2334 t3:2962
num:20 t1:30618 t2:1755 t3:17631
num:21 t1:27969 t2:2377 t3:2890
num:21 t1:28151 t2:2251 t3:2980
CONSUMER: Done.
CONSUMER: Done.
PRODUCER: Done -- exiting.

Which release version do you use? r32.3.1 or r32.4.4?

y2zwei · November 18, 2020, 3:26am

Tegra_Multimedia_API_R32.2.3_aarch64.tbz2

Topic		Replies	Views
Unresponsive/slow RTSP stream from USB Cam + Deepstream on Jetson Nano Jetson Nano rtsp	12	3793	October 15, 2021
How to view Jetson nano live video from Android OS device Jetson Nano gstreamer	26	3456	October 15, 2021
Gstreamer reports Raspberry Pi camera streaming at 120fps when in reality it is only 60fps Jetson Nano camera , gstreamer	53	3745	July 19, 2022
Jetson 4k Encoding -> Decoding Pipeline and latency Jetson Nano camera , gstreamer	20	6833	October 15, 2021
Jetson NANO and USB 5.8G UVC Camera Receiver Jetson Nano	12	1488	October 14, 2021
Connect Jetson Nano to a laptop camera via RTP Jetson Nano camera	11	2705	December 22, 2021
Problems with streaming ( and receiving ) a simple usb-camera with jetson nano and gstreamer Jetson Nano camera , gstreamer	9	4297	October 15, 2021
Enable Night mode in Jeton Nano Jetson Nano camera , board-design	37	1108	February 16, 2023
Hardware re-encode MJPG to H.264 Jetson Nano	20	2958	October 14, 2021
X264 and TensorRT sudden reboot (MJPG encoder not affected, but not fast enough) on Jetson Orin Nano Jetson Orin Nano tensorrt , jetson-inference , gstreamer , jetson	52	788	June 18, 2024

Jetson nano developer kit nvjpeg encoder's speed so slowly. it encode YUV420_7264X4112_Pic 100 times need 20s

Related topics