NVJPEG can not be used in jetpack6.2

we want to use NVJPEG to encode jpeg images. code is shown below. but we get 6. it means NVJPEG_STATUS_EXECUTION_FAILED. how to fix it , or other method to encode jpg images. 3ku.

auto nvjpg_status = nvjpegEncodeYUV(
nv_handle_, nv_enc_state_, nv_enc_params_, &nvjpeg_input_image_,
NVJPEG_CSS_422, storage_width, storage_height, jpg_stream_);
if (nvjpg_status != NVJPEG_STATUS_SUCCESS) {
LOG(ERROR) << device << " " << nvjpg_status;
return -2;
}

typedef enum {    NVJPEG_STATUS_SUCCESS = 0,    NVJPEG_STATUS_NOT_INITIALIZED = 1,    NVJPEG_STATUS_INVALID_PARAMETER = 2,    NVJPEG_STATUS_BAD_JPEG = 3,    NVJPEG_STATUS_JPEG_NOT_SUPPORTED = 4,    NVJPEG_STATUS_ALLOCATOR_FAILURE = 5,    NVJPEG_STATUS_EXECUTION_FAILED = 6,    NVJPEG_STATUS_ARCH_MISMATCH = 7,    NVJPEG_STATUS_INTERNAL_ERROR = 8,    NVJPEG_STATUS_IMPLEMENTATION_NOT_SUPPORTED = 9} nvjpegStatus_t;

Hi,
We use jetson_multimedia_api on Jetson platforms. Please install the package and try the sample:

Jetson Linux API Reference: 05_jpeg_encode (JPEG encode) | NVIDIA Docs
/usr/src/jetson_multimedia_api/samples/05_jpeg_encode/

/usr/src/jetson_multimedia_api/samples/06_jpeg_decode# ./jpeg_decode num_files 1 /cargo/frame3616.jpg /cargo/nvidia-logo.yuv
./jpeg_decode: symbol lookup error: ./jpeg_decode: undefined symbol: jpeg_read_raw_data

/usr/src/jetson_multimedia_api/samples/05_jpeg_encode# ./jpeg_encode
./jpeg_encode: symbol lookup error: ./jpeg_encode: undefined symbol: jpeg_read_raw_data

it seems library problem.

/usr/src/jetson_multimedia_api/samples/06_jpeg_decode# ldd jpeg_decode
linux-vdso.so.1 (0x0000ffffb69c3000)
libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffffb68a4000)
libv4l2.so.0 => /lib/aarch64-linux-gnu/libv4l2.so.0 (0x0000ffffb678e000)
libEGL.so.1 => /lib/aarch64-linux-gnu/libEGL.so.1 (0x0000ffffb676a000)
libGLESv2.so.2 => /lib/aarch64-linux-gnu/libGLESv2.so.2 (0x0000ffffb6735000)
libX11.so.6 => /lib/aarch64-linux-gnu/libX11.so.6 (0x0000ffffb65f0000)
libnvbufsurface.so.1.0.0 => /usr/lib/aarch64-linux-gnu/tegra/libnvbufsurface.so.1.0.0 (0x0000ffffb6535000)
libnvbufsurftransform.so.1.0.0 => /usr/lib/aarch64-linux-gnu/tegra/libnvbufsurftransform.so.1.0.0 (0x0000ffffb539a000)
libnvjpeg.so => /usr/local/lib/libnvjpeg.so (0x0000ffffb4f10000)
libdrm.so.2 => /lib/aarch64-linux-gnu/libdrm.so.2 (0x0000ffffb4eec000)
libvulkan.so.1 => /lib/aarch64-linux-gnu/libvulkan.so.1 (0x0000ffffb4e4f000)
libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffffb4c6a000)
libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffffb4c46000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb4ad3000)
/lib/ld-linux-aarch64.so.1 (0x0000ffffb6993000)
libv4lconvert.so.0 => /lib/aarch64-linux-gnu/libv4lconvert.so.0 (0x0000ffffb4a4d000)
libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000ffffb4a39000)
libGLdispatch.so.0 => /lib/aarch64-linux-gnu/libGLdispatch.so.0 (0x0000ffffb48ae000)
libxcb.so.1 => /lib/aarch64-linux-gnu/libxcb.so.1 (0x0000ffffb4877000)
libnvrm_mem.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_mem.so (0x0000ffffb485f000)
libnvrm_surface.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_surface.so (0x0000ffffb4829000)
libnvrm_chip.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_chip.so (0x0000ffffb4815000)
libnvos.so => /usr/lib/aarch64-linux-gnu/tegra/libnvos.so (0x0000ffffb47f5000)
libnvbuf_fdmap.so.1.0.0 => /usr/lib/aarch64-linux-gnu/tegra/libnvbuf_fdmap.so.1.0.0 (0x0000ffffb47e2000)
librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000ffffb47ca000)
libnvrm_host1x.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_host1x.so (0x0000ffffb47a9000)
libnvvic.so => /usr/lib/aarch64-linux-gnu/tegra/libnvvic.so (0x0000ffffb4780000)
libcuda.so.1 => /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (0x0000ffffb3129000)
libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb307e000)
libXau.so.6 => /lib/aarch64-linux-gnu/libXau.so.6 (0x0000ffffb306a000)
libXdmcp.so.6 => /lib/aarch64-linux-gnu/libXdmcp.so.6 (0x0000ffffb3054000)
libnvsciipc.so => /usr/lib/aarch64-linux-gnu/tegra/libnvsciipc.so (0x0000ffffb302f000)
libnvsocsys.so => /usr/lib/aarch64-linux-gnu/tegra/libnvsocsys.so (0x0000ffffb301b000)
libnvrm_sync.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_sync.so (0x0000ffffb3004000)
libnvrm_stream.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_stream.so (0x0000ffffb2fe9000)
libnvcolorutil.so => /usr/lib/aarch64-linux-gnu/tegra/libnvcolorutil.so (0x0000ffffb2fc2000)
libnvrm_gpu.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so (0x0000ffffb2f55000)
libbsd.so.0 => /lib/aarch64-linux-gnu/libbsd.so.0 (0x0000ffffb2f2e000)

Hi,
Seems like your system is not clean Jetpack 6.2. Would suggest re-flash the developer kit and install SDK components through SDKManager, and try again. The reference samples are supposed to work well on default Jetpack release.

1 Like

YES, you are right. this picture shows the right results. 3ku

Hi:
we copy yuv data from cudamem to nvbuffer, It has increased the CPU usage by 18%, code shows bellow.
how to optimise CPU resource usage. need help. 3ku

    cudaError_t cpy_y_status = cudaMemcpy(
    nvbuffer_.planes[0].data,
    d_yuv420_planar_,
    y_size_,
    cudaMemcpyDeviceToHost
);

cudaError_t cpy_u_status = cudaMemcpy(
    nvbuffer_.planes[1].data,
    (d_yuv420_planar_ + storage_width * storage_height),
    uv_size_,
    cudaMemcpyDeviceToHost
);

cudaError_t cpy_v_status = cudaMemcpy(
    nvbuffer_.planes[2].data,
    (d_yuv420_planar_ + storage_width * storage_height * 5 / 4),
    uv_size_,
    cudaMemcpyDeviceToHost
);

Hi,
You can create NvBufSurface, map to EglImage and get CUDA pointer. So that you can copy frame data through GPU and encode the NvBufSurface to JPEG.

Hi,
we create nvbuffer, and get cuda pointer. This operation saves 8% of CPU usage (for six camera input).
code shows bellow.
cudaMemcpyAsync(nvY, d_yuv420_planar_ , y_size_, cudaMemcpyDeviceToDevice, rgb_stream_);
cudaMemcpyAsync(nvU, (d_yuv420_planar_ + storage_width * storage_height) , uv_size_, cudaMemcpyDeviceToDevice, rgb_stream_);
cudaMemcpyAsync(nvV, (d_yuv420_planar_ + storage_width * storage_height * 5 / 4) , uv_size_, cudaMemcpyDeviceToDevice, rgb_stream_);

The problem is that this function will consume 8 percentage points of CPU usage. We suspect that ‘nvjpeg result ->host’ is not based on DMA method.
“ctx_.jpegenc->encodeFromBuffer(nvbuffer_, JCS_YCbCr, &out_jpg_, jpeg_size, ctx_.quality);”

从数据看,我们怀疑函数encodeFromBuffer 在硬编码完成后,不是通过dma的方式,拷贝数据到host。

参考测试结果:

cpu参与下,把数据从gpu拷贝到专用硬件,实测会耗费8个点的cpu占用。(6颗相机)

改成零拷贝之后,可以看到cpu占用会减少8个点的cpu占用。

实际测试可以看到专用硬件硬编码函数执行后会带来8个点的cpu占用. (合理的解释是硬编码完成后, 需要cpu参与拷贝数据到host)

Hi,
You are right. The compressed data is copied through CPU. The loading is expected.

1 Like

请问下encodeFromBuffer的编码结果是否可以设置写到gpu内存中,然后通过gpu-dma将gpu中数据拷贝到host.

Hi,
No, it is not supported. The compressed data will be copied from hardware DMA buffer to CPU buffer.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.