Slow encodeFromBuffer on AGX

Hello - Having trouble using the jpeg encoder on the xavier agx. This code works fine on the NX. Each frame is taking over 10 seconds to encode on the AGX even though the code is identical.

Total units processed = 9
Average latency(usec) = 10222106
Minimum latency(usec) = 10210707
Maximum latency(usec) = 10239063

int read_video_frame(const char* inpBuf, unsigned inpBufLen, NvBuffer& buffer){
    uint32_t i, j;
    char *data;

    for (i = 0; i < buffer.n_planes; i++)
        NvBuffer::NvBufferPlane &plane = buffer.planes[i];
        std::streamsize bytes_to_read = plane.fmt.bytesperpixel * plane.fmt.width;
        data = (char *);
        plane.bytesused = 0;
        for (j = 0; j < plane.fmt.height; j++)
            unsigned numRead = std::min((unsigned)bytes_to_read, (unsigned)inpBufLen);

            memcpy(data, inpBuf, numRead);

            if (numRead < bytes_to_read) {
                return -1;

            inpBuf    += numRead;
            inpBufLen -= numRead;

            data += plane.fmt.stride;
        plane.bytesused = plane.fmt.stride * plane.fmt.height;
    return 0;

std::vector<uchar> Video::compressJpeg(const cv::Mat &image){
    cv::Mat yuv;
    cv::cvtColor(image, yuv, cv::COLOR_BGR2YUV_I420);
    unsigned long out_buf_size = image.rows * image.cols * 3 / 2;
    std::vector<uchar> out_buf(out_buf_size);

    NvBuffer buffer(V4L2_PIX_FMT_YUV420M, image.cols, image.rows, 0);

    auto ret = read_video_frame((const char*),*yuv.elemSize(), buffer);
    if(ret < 0) {
        LOG(ERROR) << "read_video_frame error";

    //set in buffer
    uchar *obuf =;
    LOG(INFO) << "Start encode";
    ret = jpegenc_->encodeFromBuffer(buffer, JCS_YCbCr, &obuf , out_buf_size, jpeg_compression_);

    if(ret < 0) {
        LOG(ERROR) << "encodeFromBuffer error";
    LOG(INFO) << "End encode";

    return out_buf;

There is a question here with a similar issue: NvJPEGDecoder ultra slow (Xavier & Nano)
The response is unhelpful as I don’t know how to use the decodeToFd() with my data. I tried to set it up like this and I get an error for the memcpy to the image plane after the call to fails:

int fd;

NvBuffer::NvBufferPlaneFormat buf_format[image.channels()];

 for (int i = 0; i < image.channels(); i++) {
    buf_format[i].height = image.rows;
    buf_format[i].width = image.cols;
    buf_format[i].bytesperpixel = 1;
    buf_format[i].stride = image.cols;
    buf_format[i].sizeimage = image.rows * image.cols;
NvBuffer buffer(V4L2_BUF_TYPE_VIDEO_CAPTURE, v4l2_memory::V4L2_MEMORY_MMAP, image.channels(), buf_format, fd);
auto ret =;
ret = jpegenc_->encodeFromFd(fd, JCS_YCbCr, &obuf , out_buf_size, jpeg_compression_);

Any help appreciated.
Thanks in advance.

Please share what the resolution is in the test. Taking 10+ seconds to encode a JPEG file is extremely slow. Can the issue be reproduced in running 05_jpeg_encode?

And please share the release version( $ head -1 /etc/nv_tegra_release ).

1920 x 1080

$ head -1 /etc/nv_tegra_release
R32 (release), REVISION: 2.3, GCID: 17644089, BOARD: t186ref, EABI: aarch64, DATE: Tue Nov 5 21:48:17 UTC 2019

I’ll look into running the example asap.

Tested a simple gstreamer pipeline and it crawls along. Very slow.

Seems like something is broken beyond my code.

    pipeline = "appsrc ! autovideoconvert ! omxh264enc ! filesink location=" + video_file.path;

The gstreamer pipeline utilizes omxh264enc to do h264 encoding. encodeFromBuffer() does JPEG encoding. Two cases are different. Please check if you can reproduce it with 05_jpeg_encode. If yes, please share the command so that we can give it a try.

And the latest release is r32.4.3, you may consider to upgrade and try the release.

Will do. My point was both are very slow.

Same code works at full speed on R32.4.3 - seems like there is some general bug with hardware encoder drivers or software on R32.2.3 - tested it on two agx devices.

Code above can be used to reproduce it.

The result we see on Xavier is ~5ms in average:

05_jpeg_encode$ gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=1920,height=1080 ! filesink location=/home/nvidia/a.yuv
05_jpeg_encode$ ./jpeg_encode /home/nvidia/a.yuv 1920 1080 /home/nvidia/a.jpg --encode-buffer --perf     ----------- Element = jpenenc -----------
Total units processed = 300
Average latency(usec) = 5754
Minimum latency(usec) = 4999
Maximum latency(usec) = 10734
App run was successful

Could you give it a try and see if you get similar result? Would like to align this result.

My jpeg_encode didn’t have a —perf argument to I used time to get this result (still using jetpack 4.2.3 on this machine and the equivalent multimedia api)

time ./jpeg_encode a.yuv 1920 1080 a.jpg --encode-buffer
App run was successful
real 0m0.099s
user 0m0.020s
sys 0m0.040s

Even though this seems much faster than the other code I would be interested to know why the original code doesn’t work as it performs fine on the nx and on the later jet pack but not on the 4.2.3 jetpack on either of the machines I have tried it on.

See the first code block with two functions in my first post for details.