Some bugs occurred when nvds_obj_enc_process() was running on Jetson AGX Thor

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson) Jetson AGX Thor
• DeepStream Version 8.0
• JetPack Version (valid for Jetson only) JetPack 7.0 L4T R38.2.2
• TensorRT Version 10.13.2
• NVIDIA GPU Driver Version (valid for GPU only) Driver Version: 580.00 CUDA Version: 13.0
• Issue Type( questions, new requirements, bugs) bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Background:

Problem 1: When I set frameData.saveImg = TRUE and save the image, sometimes there will be black lines in the picture.

        NvDsObjEncUsrArgs frameData = {0};
        /* Preset */
        frameData.isFrame = 1;
        /* To be set by user */
        frameData.saveImg = TRUE;
        frameData.attachUsrMeta = TRUE;
        /* Set if Image scaling Required */
        frameData.scaleImg = FALSE;
        frameData.scaledWidth = 0;
        frameData.scaledHeight = 0;
        /* Quality */
        frameData.quality = 80;
        /* Set to calculate time taken to encode JPG image. */
        if (calc_enc)
        {
            frameData.calcEncodeTime = 1;
        }
        /* Main Function Call */
        nvds_obj_enc_process(ctx, &frameData, ip_surf, NULL, frame_meta);

Problem 2: When I set frameData.scaleImg= TRUE, a segmentation fault will definitely occur.

        NvDsObjEncUsrArgs frameData = {0};
        /* Preset */
        frameData.isFrame = 1;
        /* To be set by user */
        frameData.saveImg = TRUE;
        frameData.attachUsrMeta = TRUE;
        /* Set if Image scaling Required */
        frameData.scaleImg = TRUE;
        frameData.scaledWidth = 720;
        frameData.scaledHeight = 406;
        /* Quality */
        frameData.quality = 80;
        /* Set to calculate time taken to encode JPG image. */
        if (calc_enc)
        {
            frameData.calcEncodeTime = 1;
        }
        /* Main Function Call */
        nvds_obj_enc_process(ctx, &frameData, ip_surf, NULL, frame_meta);

Segmentation fault info:

        Thread 7 "deepstream-imag" received signal SIGSEGV, Segmentation fault.
        [Switching to Thread 0xfffd83a298c0 (LWP 327235)]
        0x0000fffff7aac954 in nvds_obj_enc_process () from /opt/nvidia/deepstream/deepstream-8.0/lib/libnvds_batch_jpegenc.so
        (gdb) bt full
        #0  0x0000fffff7aac954 in nvds_obj_enc_process () at /opt/nvidia/deepstream/deepstream-8.0/lib/libnvds_batch_jpegenc.so
        #1  0x0000aaaaaaaa27bc [PAC] in pgie_src_pad_buffer_probe (pad=0xaaaaab648190 [GstPad|src], info=0xfffd83a28c00, ctx=0xaaaaab784010)
            at deepstream_image_meta_test.c:270
                frameData = {saveImg = false, attachUsrMeta = true, scaleImg = true, scaledWidth = 720, scaledHeight = 406, fileNameImg = '\000' <repeats 1023 times>, objNum = 0, quality = 80, isFrame = true, calcEncodeTime = false}
                frame_meta = 0xfffd20001090
                num_rects = 0
                buf = 0xfffd2d68e940
                inmap = {memory = 0xfffd2c00aa50, flags = GST_MAP_READ, data = 0xfffd2c00aac0 "", size = 72, maxsize = 72, user_data = {0x0, 0x0, 0x0, 0x0}, _gst_reserved = {0x0, 0x0, 0x0, 0x0}}
                __func__ = "pgie_src_pad_buffer_probe"
                ip_surf = 0xfffd2c00aac0
                obj_meta = 0x0
                vehicle_count = 0
                person_count = 0
                l_frame = 0xfffd240773b0
                l_obj = 0x0
                batch_meta = 0xfffd20000f40
                calc_enc_str = 0x0
                calc_enc = 0
        #2  0x0000fffff7eaa1a0 in ??? () at /usr/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0
        #3  0x0000fffff7c4dca0 [PAC] in g_hook_list_marshal () at /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
        #4  0x0000fffff7eaa764 [PAC] in ??? () at /usr/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0
        #5  0x0000fffff7eafc14 [PAC] in ??? () at /usr/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0
        #6  0x0000fffff7eb02b0 [PAC] in gst_pad_push () at /usr/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0
        #7  0x0000fffde38e627c [PAC] in ??? () at /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
        #8  0x0000fffff7c91388 in ??? () at /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
        #9  0x0000fffff77f595c [PAC] in ??? () at /usr/lib/aarch64-linux-gnu/libc.so.6
        #10 0x0000fffff785ba4c in ??? () at /usr/lib/aarch64-linux-gnu/libc.so.6

By the way, I can run it normally on my RTX 3090. Could you please take a look and see if it’s a bug?

I haven’t been able to reproduce issue 1, but issue 2 is a bug that I have reproduced.It is a code quality issue.

The current code is not open source; please place libnvds_batch_jpegenc.so in /opt/nvidia/deepstream/deepstream/lib/ as a workaround.

libnvds_batch_jpegenc.so (990.3 KB)

Thanks for your help. I can solve issue 2 with your libnvds_batch_jpegenc.so file.

But regarding issue 1, I have some new additions today.

When I only run the deepstream-image-meta-test example of DeepStream on the device(Jetson AGX Thor), the probability of encountering issue 1 was very low (I think this might be the case for you, so you cannot reproduce issue1).

However, when my colleague run 51 video hardware transcoding in container 1. During this period, I run the deepstream-image-meta-test example of DeepStream in container 2, the probability of encountering issue1 was extremely high.

He used such command to convert 51 RTSP videos in container 1:

ffmpeg -hwaccel cuda -hwaccel_device 0 -hwaccel_output_format cuda -i rtsp://localhost:8554/mystream -c:v h264_nvenc -rtsp_transport tcp -r 15 -g 30 -b:v 500k -an -threads 1 -flags low_delay -fflags nobuffer+flush_packets -avioflags direct -f rtsp rtsp://localhost:8554/rtsp1

While he was transcoding, I check with nvidia-smi dmon:

Then, I run deepstream-image-meta-test example of DeepStream in container 2 and check nvidia-smi dmon:

I check from the thousands of saved images that some of them had the black lines mentioned in issue 1.

From the usage rate, it can be seen that the hardware resources should be sufficient.
So is it possible that different processes accessing the hardware resources have caused some conflicts, leading to issue 1? I expect you to reproduce this problem based on my description.

I have successfully reproduced the issue using the method you provided, but the conditions are very demanding。

Even when I’m running only the encoder and using 98% of its resources, I still can’t reproduce the problem. requiring simultaneous stress testing of both the encoder and decoder.

Saving this script as hw.sh and then running INSTANCES=55 ./hw.sh mp makes it more likely that this problem will occur.

Thank you for your feedback; we are currently discussing this issue internally.

#!/usr/bin/env bash
set -euo pipefail

mode_arg=${1:-}
src_arg=${2:-}
file_arg=${3:-}

MODE=${MODE:-${mode_arg:-split}}
SRC=${SRC:-${src_arg:-file}}
INPUT_FILE=${INPUT_FILE:-${file_arg:-/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4}}

if [[ "${MODE}" == "-h" || "${MODE}" == "--help" || "${MODE}" == "help" ]]; then
  cat <<'USAGE'
Usage:
  hw.sh [split|mp] [file|testsrc] [input_file]

Modes:
  split  Single ffmpeg process, one source, split to N encoders (default)
  mp     Multi-process, one ffmpeg process per encoder instance

Env vars:
  SRC         Input source: file|testsrc (default: file)
  INPUT_FILE  When SRC=file, path to input file
  HW_OUTPUT_CUDA  When SRC=file and MODE=mp, set 1 to force decoded frames to stay on GPU (default: 0)
  INSTANCES   Number of encoder instances (default: 40)
  DEVICE      CUDA device index (default: 0)
  SIZE        testsrc size (default: 1920x1080)
  SRC_RATE    testsrc rate (default: 30)
  OUT_RATE    encoder output fps (default: 30)
  GOP         gop (default: 30)
  BITRATE     video bitrate (default: 5000k)
USAGE
  exit 0
fi

INSTANCES=${INSTANCES:-40}
DEVICE=${DEVICE:-0}
SIZE=${SIZE:-1920x1080}
SRC_RATE=${SRC_RATE:-30}
OUT_RATE=${OUT_RATE:-30}
GOP=${GOP:-30}
BITRATE=${BITRATE:-5000k}
HW_OUTPUT_CUDA=${HW_OUTPUT_CUDA:-0}

input_args=()
input_spec=()

compute_input() {
  input_args=()
  input_spec=()

  if [[ "${SRC}" == "testsrc" ]]; then
    input_spec+=( -f lavfi -i "testsrc=size=${SIZE}:rate=${SRC_RATE}" )
  else
    input_spec+=( -stream_loop -1 -i "${INPUT_FILE}" )

    # Enable CUDA-accelerated decoding for file input.
    # Note: forcing decoded frames to be CUDA frames (AV_PIX_FMT_CUDA) may trigger
    # implicit format conversion filters (auto_scale) that do not support CUDA frames
    # on some builds/setups, resulting in:
    #   "Impossible to convert between the formats... Parsed_null_0 ... auto_scale_0"
    # So by default we do NOT set -hwaccel_output_format cuda.
    input_args+=( -hwaccel cuda -hwaccel_device "${DEVICE}" )
    if [[ "${MODE}" == "mp" && "${HW_OUTPUT_CUDA}" == "1" ]]; then
      input_args+=( -hwaccel_output_format cuda )
    fi
  fi
}

out_args=(
  -c:v h264_nvenc
  -r "${OUT_RATE}" -g "${GOP}" -b:v "${BITRATE}"
  -an -threads 1 -flags low_delay -fflags nobuffer+flush_packets -avioflags direct
  -f null -
)

run_mp() {
  MODE=mp
  compute_input

  for i in $(seq 1 "${INSTANCES}"); do
    ffmpeg \
      "${input_args[@]}" \
      "${input_spec[@]}" \
      "${out_args[@]}" &
  done
  wait
}

run_split() {
  MODE=split
  compute_input

  split_labels=""
  for i in $(seq 0 $((INSTANCES - 1))); do
    split_labels+="[v${i}]"
  done

  cmd=(
    ffmpeg
    "${input_args[@]}"
    "${input_spec[@]}"
    -filter_complex "split=${INSTANCES}${split_labels}"
  )

  for i in $(seq 0 $((INSTANCES - 1))); do
    cmd+=(
      -map "[v${i}]"
      "${out_args[@]}"
    )
  done

  echo "${cmd[@]}"
  "${cmd[@]}"
}

case "${MODE}" in
  mp) run_mp ;;
  split|"") run_split ;;
  *)
    echo "Unknown MODE: ${MODE} (use split|mp)" >&2
    exit 2
    ;;
esac

Hello, following the discussion, are there any results or solutions for this issue?