JPEG decoding/encoding performance regression in JP5.1.1

As commented in this topic, in JP 5.x, we should destroy and re-create the JPEG encoder at application level when resolution changes, otherwise the output encoded image might be incorrect.

This rule also applies to JPEG decoder.

Yet we found after using decoder/encoder re-creation, the speed is about 17x slower compare to JP4.5 on Jetson Xavier NX.

We decoded and then encoded 160+ varying-resolution JPEGs, here are the detailed performance data:

# JP 4.5 (w/o re-creation, correct)
real    0m3.699s
user    0m0.304s
sys     0m1.032s

# JP 5.1.1 (w/o re-creation, incorrect, 1.7x slower)
real    0m6.253s
user    0m0.316s
sys     0m3.156s

# JP 5.1.1 (w/ re-creation, correct, 17.5x slower)
real    1m4.912s
user    0m0.707s
sys     1m0.282s

Any idea for this performance regression?

Hi,
Please share a test sample. So that we can replicate the issue on developer kit and check further.

@DaneLLL
test sample:

#include "NvJpegDecoder.h"
#include "NvJpegEncoder.h"

#include <string>
#include <vector>
#include <iostream>
#include <fstream>
#include <memory>
#include <cassert>

// #define JETPACK_5
// #define RECREATE

#ifdef JETPACK_5
#include <nvbufsurface.h>
#include <nvbufsurftransform.h>
#else
#include <nvbuf_utils.h>
#endif

std::string read(const std::string &path) {
    std::ifstream in(path);
    in.seekg(0, std::ios::end);
    const std::streamsize size = in.tellg();
    std::string data;
    data.resize(size);
    in.seekg(0, std::ios::beg);
    in.read(const_cast<char *>(data.data()), size);
    return data;
}

void write(const std::string &path, const char *data, int len) {
    std::ofstream out(path);
    out.write(data, len);
    out.flush();
}

int decode(NvJPEGDecoder *dec, const std::string &data) {
    int decoded_fd = -1;
    uint32_t pixfmt, width, height;
    int ret = dec->decodeToFd(decoded_fd, (unsigned char *) data.data(), data.size(), pixfmt, width, height);
    assert(ret == 0);

#ifdef JETPACK_5
    NvBufSurface *decoded_surf = nullptr;
    ret = NvBufSurfaceFromFd(decoded_fd, (void **) &decoded_surf);
    assert(ret == 0 && decoded_surf != nullptr);
    // allocate
    NvBufSurfaceAllocateParams create_params;
    memset(&create_params, 0, sizeof(create_params));
    create_params.params.width = width;
    create_params.params.height = height;
    create_params.params.memType = NVBUF_MEM_SURFACE_ARRAY;
    create_params.params.layout = NVBUF_LAYOUT_PITCH;
    create_params.params.colorFormat = NVBUF_COLOR_FORMAT_YUV420;
    create_params.memtag = NvBufSurfaceTag_VIDEO_CONVERT;
    NvBufSurface *cropped_surf;
    ret = NvBufSurfaceAllocate(&cropped_surf, 1, &create_params);
    assert(ret == 0);
    // crop
    NvBufSurfTransformParams transform_params;
    memset(&transform_params, 0, sizeof(transform_params));
    NvBufSurfTransformRect src_rect;
    src_rect.left = 0;
    src_rect.top = 0;
    src_rect.width = width;
    src_rect.height = height;
    transform_params.transform_flag = NVBUFSURF_TRANSFORM_CROP_SRC;
    transform_params.src_rect = &src_rect;
    ret = NvBufSurfTransform(decoded_surf, cropped_surf, &transform_params);
    assert(ret == 0);
    return cropped_surf->surfaceList[0].bufferDesc;
#else
    // allocate
    NvBufferCreateParams create_params;
    create_params.width = width;
    create_params.height = height;
    create_params.payloadType = NvBufferPayload_SurfArray;
    create_params.layout = NvBufferLayout_Pitch;
    create_params.colorFormat = NvBufferColorFormat_YUV420;
    create_params.nvbuf_tag = NvBufferTag_VIDEO_CONVERT;
    int cropped_fd = -1;
    ret = NvBufferCreateEx(&cropped_fd, &create_params);
    assert(ret == 0);
    // crop
    NvBufferTransformParams transform_params;
    memset(&transform_params, 0, sizeof(transform_params));
    transform_params.transform_flag = NVBUFFER_TRANSFORM_CROP_SRC;
    transform_params.src_rect.left = 0;
    transform_params.src_rect.top = 0;
    transform_params.src_rect.width = width;
    transform_params.src_rect.height = height;
    ret = NvBufferTransform(decoded_fd, cropped_fd, &transform_params);
    assert(ret == 0);
    return cropped_fd;
#endif
}

void release(int fd) {
#ifdef JETPACK_5
    NvBufSurface *surf = nullptr;
    int ret = NvBufSurfaceFromFd(fd, (void **) &surf);
    assert(ret == 0 && surf != nullptr);
    ret = NvBufSurfaceDestroy(surf);
    assert(ret == 0);
#else
    int ret = NvBufferDestroy(fd);
    assert(ret == 0);
#endif
}

int main(int argc, char *argv[]) {
    std::vector<std::string> files = {
        read("test1.jpg"),
        read("test2.jpg"),
        read("test3.jpg"),
        read("test4.jpg")
    };
    int buf_size = 1920 * 1080 * 3;
    std::unique_ptr<char[]> buf = std::unique_ptr<char[]>(new char[buf_size]);
#ifndef RECREATE
    auto dec = std::shared_ptr<NvJPEGDecoder>(NvJPEGDecoder::createJPEGDecoder("dec"));
    auto enc = std::shared_ptr<NvJPEGEncoder>(NvJPEGEncoder::createJPEGEncoder("enc"));
#endif
    for (int i = 0; i < 200; ++i) {
#ifdef RECREATE
        auto dec = std::shared_ptr<NvJPEGDecoder>(NvJPEGDecoder::createJPEGDecoder("dec"));
        auto enc = std::shared_ptr<NvJPEGEncoder>(NvJPEGEncoder::createJPEGEncoder("enc"));
#endif
        std::cout << "Round " << i << std::endl;
        auto &data = files[i % files.size()];
        int fd = decode(dec.get(), data);
        unsigned char *out_buf = (unsigned char *) buf.get();
        unsigned long out_buf_size = buf_size;
        int ret = enc->encodeFromFd(fd, JCS_YCbCr, &out_buf, out_buf_size);
        assert(ret == 0);
        release(fd);
        write("output" + std::to_string(i) + ".jpg", (char *) out_buf, out_buf_size);
    }
    return 0;
}

Note: uncomment the following line(s) to switch to JP5.x and enable recreation:

// #define JETPACK_5
// #define RECREATE

to compile on JP4.x:

g++ -std=c++11 -I/usr/src/jetson_multimedia_api/include \
    -I/usr/src/jetson_multimedia_api/include/libjpeg-8b \
    main.cc \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvBuffer.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvElement.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvElementProfiler.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvLogging.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvJpegEncoder.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvJpegDecoder.cpp \
    -L/usr/lib/aarch64-linux-gnu/tegra \
    -lnvjpeg -lnvbuf_utils -o main

to compile on JP5.x:

g++ -std=c++11 -I/usr/src/jetson_multimedia_api/include \
    -I/usr/src/jetson_multimedia_api/include/libjpeg-8b \
    main.cc \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvBuffer.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvElement.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvElementProfiler.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvLogging.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvJpegEncoder.cpp \
    /usr/src/jetson_multimedia_api/samples/common/classes/NvJpegDecoder.cpp \
    -L/usr/lib/aarch64-linux-gnu/tegra \
    -lnvjpeg -lnvbufsurface -lnvbufsurftransform -o main

then generate test images:

gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=480,height=240 ! jpegenc ! filesink location=test1.jpg
gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=640,height=480 ! jpegenc ! filesink location=test2.jpg
gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=1280,height=720 ! jpegenc ! filesink location=test3.jpg
gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=1920,height=1080 ! jpegenc ! filesink location=test4.jpg

run the test:

time ./main

our test output:

# JP 4.5 (w/o re-creation, correct)
real    0m1.475s
user    0m0.128s
sys     0m0.352s

# JP 5.1.1 (w/o re-creation, incorrect, 2.3x slower)
real    0m3.456s
user    0m0.165s
sys     0m2.139s

# JP 5.1.1 (w/ re-creation, correct, 51x slower)
real    1m16.267s
user    0m0.585s
sys     1m13.150s

Hi,

has this problem been confirmed or it’s my improper usage?

Hi,
We will set up and check this. On current release, you may create multiple NvJPEGEncoder and NvJPEGDecoder classes for different resolutions.

Hi,
It looks like the created classes are not deleted in RECREATE case. Could you add the lines and give it a try?

delete enc;
delete dec;

Hi,
The smart pointers will take care of objects deletion.

Hi,
Thanks for the information. Sorry we don’t have much experience in using auto.

Hi,
We have tried the test cde and there is certain finding. On Jetpack 5.1.1, encodeFromFd() is slow and performance is impacted. The issue is not seen on internal build of Jetpack 5.1.2. We would expect this is not an issue in next release.

Please wait for next Jetpack 5.1.2 and give it a try then.

Hi,
Thanks for the information. When would 5.1.2 be released?

Hi,
Ideally it will be released in two weeks.

Hi, it’s almost 2 weeks passed, would 5.1.2 be released as expected?

1 Like

Hi,
We are actively working on the delivery. Sorry for the delay.

So,how long 5.1.2 will be released ?

Hi @DaneLLL

Another 2 weeks has been passed:)
Is there an official release time for JP5.1.2 now?

Hi,
Jetpack 5.1.2 is released. Please upgrade and give it a try.

Hi,

After upgrading to 5.1.2 (according to this documentation), we encountered this error when launching a process, any idea for this?

NvRmMemInitNvmap failed with No such file or directory
549: Memory Manager Not supported



****NvRmMemInit failed**** error type: 196626


*** NvRmMemInit failed NvRmMemConstructor
Segmentation fault (core dumped)

It seems the performance issue has been resolved in JP5.1.2, thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.