Segmentation Fault in DeepStream Pipeline with Custom TRT Engine

Hello,
I am encountering a segmentation fault while running a DeepStream pipeline that uses a custom TensorRT engine(Yolov8). Below are the details of the setup and error:

Pipeline Description

rtspsrc location=rtsp://xxx.xxx.x.xx:xxx/stream1 latency=50 buffer-mode=auto drop-on-latency=true do-retransmission=false udp-buffer-size=212000 ! rtph265depay ! h265parse config-interval=-1 ! nvv4l2decoder ! metaInsert.sink_0 nvstreammux width=1280 height=1280 batch-size=1 live-source=1 name=metaInsert ! nvinfer name=inf ! nvdsosd ! nvegltransform ! nveglglessink sync=true force-aspect-ratio=false qos=0 name=nvGleSink

Environment Details

Model: NVIDIA Orin NX Developer Kit - Jetpack 5.1.2
Hardware:

  • Module: NVIDIA Jetson Orin NX (16GB ram)
    Platform:
  • Distribution: Ubuntu 20.04 focal
    Libraries:
  • CUDA: 11.4.315
  • cuDNN: 8.6.0.166
  • TensorRT: 8.5.2.2
  • VPI: 2.3.9
  • Vulkan: 1.3.204
  • OpenCV: 4.5.5 - with CUDA: YES

Error Output

  1. The pipeline initializes successfully, and the TensorRT engine (i2.engine) is deserialized without issues.

  2. During pipeline execution, the following warning appears:

    WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.

  3. Shortly after, the pipeline crashes with a segmentation fault:

    Segmentation fault (core dumped)


Steps Taken So Far

  1. Verified that the TensorRT engine (i2.engine) is compatible with the platform’s TensorRT version.
  2. Confirmed that the config_infer_primary_bins.txt file is correctly configured and loads successfully.
  3. Ensured that the RTSP source is working and provides a valid stream.

Questions

  1. What could be the possible reasons for the segmentation fault in this context?
  2. Is the warning related to the kEXPLICIT_BATCH flag a potential cause for this issue? If so, how can I address it?
  3. Are there any additional debug steps I can take to identify the root cause of the segmentation fault?

Any help or suggestions to resolve this issue would be greatly appreciated.

Thank you!

What’s the version of DeepStream you are using?

No.

You can use the gdb tool to check the crash stack.

$gdb --args <your_command>
$r
after the crash
$bt
1 Like

Thank you for your response.
DeepStream-6.3
Based on the inspection of the stack trace, I’ve identified that the execution is reaching the NvDsInferParseCustomYoloV8 function in the custom YOLOv8 parsing logic. The tensor in question is named output0, and its shape appears to be quite complex:

tensor: float32[batch, 10, (floor(floor(floor(height/2 - 1/2)/2)/2) + 1)*(floor(floor(floor(width/2 - 1/2)/2)/2) + 1) + (floor(floor(floor(floor(height/2 - 1/2)/2)/2)/2) + 1)*(floor(floor(floor(floor(width/2 - 1/2)/2)/2)/2) + 1) + (floor(floor(floor(floor(floor(height/2 - 1/2)/2)/2)/2)/2) + 1)*(floor(floor(floor(floor(floor(width/2 - 1/2)/2)/2)/2)/2) + 1)]

(INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT images 3x544x960
1 OUTPUT kFLOAT output0 10x10710 )

Given this complex tensor shape, it seems that the issue may be arising from how this tensor is processed in the parsing function. Could you confirm if there is a dedicated parsing function for the custom YOLO models that handles the tensor output0? This would help ensure that the parsing logic is properly aligned with the tensor format and might resolve any parsing issues.

Thank you again for your help, and I look forward to your advice on how to proceed.

Where did you get this API? We currently have no post-processing for yolov8 model in DeepStream. If you have the source code of this API, it is recommended that you debug it yourself according to the model you are using first.

Thank you for your response. I implemented the solution using the following repository: YOLOv8-TensorRT by triple-Mu.

Could you please confirm which versions of YOLO are officially supported by DeepStream? Additionally, do you have any suggestions or recommendations for adapting YOLOv8 for DeepStream if there is no official post-processing support?

YoloV7. You can refer to our sample deepstream_yolo.

About YoloV8, you can refer to this DeepStream-Yolo project.

1 Like

Thank you for your reply.

Below is my current implementation of nvdsparsebbox_yolov8.cpp. However, I am running into issues.

#include "nvdsinfer_custom_impl.h"
#include <cassert>
#include <iostream>
#include <algorithm>

// Custom parsing function declaration
extern "C" bool NvDsInferParseCustomYoloV8(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
                                           NvDsInferNetworkInfo const& networkInfo,
                                           NvDsInferParseDetectionParams const& detectionParams,
                                           std::vector<NvDsInferParseObjectInfo>& objectList);

// Utility function to clip bounding box coordinates
static __inline__ float bbox_clip(const float& val, const float& minVal = 0.f, const float& maxVal = 1280.f)
{
    assert(minVal <= maxVal);
    return std::max(std::min(val, (maxVal - 1)), minVal);
}

// Decoding YOLOv8 tensor output
static std::vector<NvDsInferParseObjectInfo> decodeYoloV8Tensor(const int* num_dets,
                                                                const float* bboxes,
                                                                const float* scores,
                                                                const int* labels,
                                                                const unsigned int& img_w,
                                                                const unsigned int& img_h)
{
    std::vector<NvDsInferParseObjectInfo> bboxInfo;
    size_t nums = num_dets[0]; // Number of detections

    for (size_t i = 0; i < nums; i++) {
        float x0 = bboxes[i * 4 + 0];  // x1
        float y0 = bboxes[i * 4 + 1];  // y1
        float x1 = bboxes[i * 4 + 2];  // x2
        float y1 = bboxes[i * 4 + 3];  // y2

        std::vector<float> class_scores;
        for (int j = 0; j < 6; j++) {
            class_scores.push_back(scores[i * 6 + j]);
        }

        float max_score = *std::max_element(class_scores.begin(), class_scores.end());
        int best_class = std::distance(class_scores.begin(), std::max_element(class_scores.begin(), class_scores.end()));

        x0 = bbox_clip(x0, 0.f, img_w);
        y0 = bbox_clip(y0, 0.f, img_h);
        x1 = bbox_clip(x1, 0.f, img_w);
        y1 = bbox_clip(y1, 0.f, img_h);

        NvDsInferParseObjectInfo obj;
        obj.left = x0;
        obj.top = y0;
        obj.width = x1 - x0;
        obj.height = y1 - y0;
        obj.detectionConfidence = max_score;
        obj.classId = best_class;

        bboxInfo.push_back(obj);
    }

    return bboxInfo;
}

// Custom parsing function definition
extern "C" bool NvDsInferParseCustomYoloV8(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
                                           NvDsInferNetworkInfo const& networkInfo,
                                           NvDsInferParseDetectionParams const& detectionParams,
                                           std::vector<NvDsInferParseObjectInfo>& objectList)
{
    if (outputLayersInfo.empty() || outputLayersInfo.size() != 1) {
        std::cerr << "Could not find output layer in bbox parsing" << std::endl;
        return false;
    }

    const NvDsInferLayerInfo& num_dets = outputLayersInfo[1];
    const NvDsInferLayerInfo& bboxes = outputLayersInfo[0];
    const NvDsInferLayerInfo& scores = outputLayersInfo[1];
    const NvDsInferLayerInfo& labels = outputLayersInfo[0];

    assert(num_dets.dims.numDims == 2);
    assert(bboxes.dims.numDims == 3);
    assert(scores.dims.numDims == 2);
    assert(labels.dims.numDims == 2);

    std::vector<NvDsInferParseObjectInfo> objects = decodeYoloV8Tensor(
        (const int*)(num_dets.buffer),
        (const float*)(bboxes.buffer),
        (const float*)(scores.buffer),
        (const int*)(labels.buffer),
        networkInfo.width,
        networkInfo.height
    );

    objectList.clear();
    objectList = objects;

    return true;
}

CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV8);
  • Is the mapping of outputLayersInfo for bboxes, scores, and labels correct?
  • Are there other issues in my code that could cause these inconsistencies?

Thank you in advance!

Since we currently have no post-processing for yolov8 model in DeepStream, We suggest you follow the code of the project I attached earlier nvdsparsebbox_Yolo.cpp. If you have any questions, you can consult the project owner directly.

1 Like