Hardware Accelerated JPEG encode/decode on Jetson Xavier JP 5.1.3

Hi,
This should be an issue in the code. Please check
Hardware Accelerated JPEG encode/decode on Jetson Xavier JP 5.1.3 - #36 by DaneLLL

You may dump out YUV to check further.

The attached sample application was updated to more resemble the 06_jpeg_decode sample from the SDK. Using the same input JPEG for the SDK sample and the attached source code, the YUV image buffer is written to file using dump_dmabuf.

jpeg-hwaccel.zip (22.3 KB)

The SDK sample results in a valid YUV image with dimensions that match those of the input JPEG, 640x480.

The attached source code results in a valid YUV image but has dimensions that differ from the input JPEG. The output image is has dimensions 512x240 and appears as so:

Hi,
The pitch of the NvBufSurface is not taken into account, so the data is not read correctly. Please check pitch, width, height of the NvBufSurface to put each line correctly. May need to copy the data line by line.

NvBufSurface is layout of hardware DMA buffer, so pitch is not equal to width for certain resolutions. And the data has to be put line by line.

The following code from the attached sample project mimics the 06_jpeg_decode sample from the SDK.

  NvJPEGDecoder *jpegdec = NvJPEGDecoder::createJPEGDecoder("jpegdec");
  if (!jpegdec)
  {
    std::cerr << "NvJPEGDecoder::createJPEGDecoder() FAILED" << std::endl;
    return cv::cuda::GpuMat();
  }

  jpegdec->disableMjpegDecode();

  uint32_t width, height, pixfmt;
  ret = jpegdec->decodeToFd(fd, buffer, size, pixfmt, width, height);
  if (ret == -1 || fd == 0)
  {
    std::cerr << "jpegdec->decodeToFd() FAILED" << std::endl;
    return cv::cuda::GpuMat();
  }

  NvBufSurf::NvCommonAllocateParams params;
  /* Create PitchLinear output buffer for transform. */
  params.memType = NVBUF_MEM_SURFACE_ARRAY;
  params.width = width;
  params.height = height;
  params.layout = NVBUF_LAYOUT_PITCH;
  if (out_pixfmt == 1)
    params.colorFormat = NVBUF_COLOR_FORMAT_NV12;
  else if (out_pixfmt == 2)
    params.colorFormat = NVBUF_COLOR_FORMAT_YUV420;
  else if (out_pixfmt == 3)
    params.colorFormat = NVBUF_COLOR_FORMAT_NV16;
  else if (out_pixfmt == 4)
    params.colorFormat = NVBUF_COLOR_FORMAT_NV24;

  params.memtag = NvBufSurfaceTag_VIDEO_CONVERT;

  ret = NvBufSurf::NvAllocate(&params, 1, &dst_dma_fd);
  if (ret == -1)
  {
    std::cerr << "NvBufSurf::NvAllocate() FAILED, create dmabuf failed" << std::endl;
    return cv::cuda::GpuMat();
  }

  /* Clip & Stitch can be done by adjusting rectangle. */
  NvBufSurf::NvCommonTransformParams transform_params;
  transform_params.src_top = 0;
  transform_params.src_left = 0;
  transform_params.src_width = width;
  transform_params.src_height = height;
  transform_params.dst_top = 0;
  transform_params.dst_left = 0;
  transform_params.dst_width = width;
  transform_params.dst_height = height;
  transform_params.flag = NVBUFSURF_TRANSFORM_FILTER;
  transform_params.flip = NvBufSurfTransform_None;
  transform_params.filter = NvBufSurfTransformInter_Nearest;

  ret = NvBufSurf::NvTransform(&transform_params, fd, dst_dma_fd);
  if (ret == -1 || dst_dma_fd == -1)
  {
    std::cerr << "NvBufSurf::NvTransform() FAILED" << std::endl;
    return cv::cuda::GpuMat();
  }

#if 1
  // Write raw video frame to file.
  // output is 512x240 corrupt image, but valid YUV
  std::ofstream *out_file = new std::ofstream("/test/nvidia/dumped-output.yuv");
  if (out_file)
  {
    /* Dumping two planes for NV12, NV16, NV24 and three for I420 */
    dump_dmabuf(dst_dma_fd, 0, out_file);
    dump_dmabuf(dst_dma_fd, 1, out_file);
    if (out_pixfmt == 2)
    {
      dump_dmabuf(dst_dma_fd, 2, out_file);
    }
  }
#endif

The code from the SDK is identical and the calls to dump_dmabuf generate a valid YUV at the correct resolution. The code from the attached sample project generates a YUV image at an incorrect resolution of 512x240 as shown in the previous post. Both applications tested with same input JPEG.

How is the pitch not taken into account and what needs to be done to resolve the issue?

jpeg_hwaccel.zip (22.3 KB)

Hi,
Are you able to allocate dst_dma_fd in RGBA? So that it can be mapped to gpuMat directly. For mapping NVBUF_COLOR_FORMAT_YUV420 to cv::Mat yPlane, uPlane, vPlane, if the pitch is not well considered, the output looks corrupted. It would be more straightforward to map single-plane RGBA.

The changes result in a valid dumped RGBA image:

But the encoded JPEG now appears corrupt:

image_00000_out

Please see the updated code in the attached sample project.

jpeg_hwaccel.zip (21.5 KB)

Hi,
Please try this and see if it wroks:

  // Get the pointer to the data
  data_ptr = surface->surfaceList[0].dataPtr;

  // Create cv::Mat from the mapped data pointer
  // Assuming RGBA format
  cv::Mat imgRGBA(height, width, CV_8UC4, data_ptr);

  // Unmap the surface after accessing the data
-  NvBufSurfaceUnMap(surface, 0, 0);

  cv::cuda::GpuMat gpuImgRGBA;
  gpuImgRGBA.upload(imgRGBA);

  // Convert RGBA to BGR
  cv::cuda::GpuMat gpuImgBGR;
  cv::cuda::cvtColor(gpuImgRGBA, gpuImgBGR, cv::COLOR_RGBA2BGR);

+  NvBufSurfaceUnMap(surface, 0, 0);

Looks like the surface is un-mapped too early. Please try to do it after cvtColor()

Unfortunately this gives the same results. The dumped RGBA image is valid but the encoded JPEG is invalid.

Does the sample code produce a valid output JPEG in your tests?

jpeg_hwaccel.zip (21.5 KB)

Hi,
We have not tried it yet since it looks to be an issue in the application. We would like to encourage the community to be able to debug the open-source code. Since the dumped YUV is good, NvBufSurface APIs shall be good, and it should be something wrong while integrating with OpenCV.

The issue appears to be in this code. The dumped_decoded.rgba image is correct but the dumped_decoded_cpu.rgba image is not.

#if 1
  // Write raw RGBA frame from DMA buffer to file.
  std::ofstream *out_file = new std::ofstream("/test/nvidia/dumped-decoded.rgba");
  if (out_file)
  {
    dump_dmabuf(dst_dma_fd, 0, out_file);
  }
#endif

  NvBufSurface *surface = NULL;
  void *data_ptr = NULL;

  // Map the NvBufSurface from fd
  ret = NvBufSurfaceFromFd(dst_dma_fd, (void **)(&surface));
  if (ret != 0)
  {
    std::cerr << "Failed to get NvBufSurface from fd" << std::endl;
    return cv::cuda::GpuMat();
  }

  // Map the surface to a CPU-accessible memory
  ret = NvBufSurfaceMap(surface, 0, 0, NVBUF_MAP_READ);
  if (ret != 0)
  {
    std::cerr << "Failed to map NvBufSurface" << std::endl;
    return cv::cuda::GpuMat();
  }

  // Synchronize the surface for CPU access
  ret = NvBufSurfaceSyncForCpu(surface, 0, 0);
  if (ret != 0)
  {
    std::cerr << "Failed to sync NvBufSurface for CPU" << std::endl;
    NvBufSurfaceUnMap(surface, 0, 0);
    return cv::cuda::GpuMat();
  }

  // Get the pointer to the data
  data_ptr = surface->surfaceList[0].dataPtr;

  // Create cv::Mat from the mapped data pointer
  // Assuming RGBA format
  cv::Mat imgRGBA(height, width, CV_8UC4, data_ptr);

#if 1
  // Write raw RGBA frame from CPU to file.
  std::ofstream *out_file2 = new std::ofstream("/test/nvidia/dumped_decoded_cpu.rgba");
  if (out_file2)
  {
    out_file2->write(reinterpret_cast<const char *>(imgRGBA.data), imgRGBA.total() * imgRGBA.elemSize());
    out_file2->close();
  }
#endif

jpeg_hwaccel.zip (21.9 KB)

Hi,
The dump_dmabuf() in NvUtils.cpp is

int
dump_dmabuf(int dmabuf_fd,
                unsigned int plane,
                std::ofstream * stream)
{
    if (dmabuf_fd <= 0)
        return -1;

    int ret = -1;

    NvBufSurface *nvbuf_surf = 0;
    ret = NvBufSurfaceFromFd(dmabuf_fd, (void**)(&nvbuf_surf));
    if (ret != 0)
    {
        return -1;
    }
    ret = NvBufSurfaceMap(nvbuf_surf, 0, plane, NVBUF_MAP_READ_WRITE);
    if (ret < 0)
    {
        printf("NvBufSurfaceMap failed\n");
        return ret;
    }
    NvBufSurfaceSyncForCpu (nvbuf_surf, 0, plane);
    for (uint i = 0; i < nvbuf_surf->surfaceList->planeParams.height[plane]; ++i)
    {
        stream->write((char *)nvbuf_surf->surfaceList->mappedAddr.addr[plane] + i * nvbuf_surf->surfaceList->planeParams.pitch[plane],
                        nvbuf_surf->surfaceList->planeParams.width[plane] * nvbuf_surf->surfaceList->planeParams.bytesPerPix[plane]);
        if (!stream->good())
            return -1;
    }
    ret = NvBufSurfaceUnMap(nvbuf_surf, 0, plane);
    if (ret < 0)
    {
        printf("NvBufSurfaceUnMap failed\n");
        return ret;
    }
    return 0;
}

Calling the function can dump correct data to a file. So please try

  1. Use the data pointer nvbuf_surf->surfaceList->mappedAddr.addr[plane]

  2. Check the value of nvbuf_surf->surfaceList->planeParams.pitch[plane] and nvbuf_surf->surfaceList->planeParams.width[plane]. If pitch and width are not identical, do you know if pitch can be set to cv::Mat?

Hi,
After modifying this line in NvSurfEncoder::decode()

data_ptr =  surface->surfaceList->mappedAddr.addr[0];//surface->surfaceList[0].dataPtr;

dumped_decoded_cpu.rgba and encoded JPEG looks good. Please give it a try. Thanks.

Thanks greatly for all your help