Converting Custom RetinaNet model to TensorRT in DeepStream

• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) : 6.1
• TensorRT Version : 8.6.2.3
• Issue Type( questions, new requirements, bugs) : question

Hello,

I have a custom RetinaNet model that I would like to run in my DeepStream pipeline.
Here is my current configuration file for nvinfer element.

[property]
gpu-id=0
model-color-format=0 # 0=RGB, 1=BGR
onnx-file=my-model.onnx
model-engine-file=my-model.onnx_b1_gpu0_fp16.engine
labelfile-path=abels.txt
infer-dims=3;1920;1080
batch-size=1
maintain-aspect-ratio=1

# Network settings
network-mode=2         # FP16 mode
network-type=0
num-detected-classes=4
process-mode=1
gie-unique-id=1

# Memory and buffer settings
workspace-size=4096
maintain-aspect-ratio=1
network-input-order=0   # NCHW format
output-blob-names=class_logits;box_regression

# Performance settings
custom-network-config=1

After successful model conversion to TensorRT during the inference i get such an error:

0:00:01.522554237 1354965 0xaaab12279260 ERROR
nvinfer gstnvinfer.cpp:678:gst_nvinfer_logger:<dw-nvinfer> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::parseBoundingBox() <nvdsinfer_context_impl_output_parsing.cpp:60> [UID = 1]: Could not find output coverage layer for parsing objects
0:00:01.522622247 1354965 0xaaab12279260 ERROR
nvinfer gstnvinfer.cpp:678:gst_nvinfer_logger:<dw-nvinfer> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::fillDetectionOutput() <nvdsinfer_context_impl_output_parsing.cpp:736> [UID = 1]: Failed to parse bboxes
Segmentation fault (core dumped)

This is obviously an error with my bounding box parsing. When running polygraphy on my onnx model this is what I get:

Name: main_graph | ONNX Opset: 17
    
    ---- 1 Graph Input(s) ----
    {input [dtype=float32, shape=('batch_size', 3, 1920, 1080)]}
    
    ---- 2 Graph Output(s) ----
    {class_logits [dtype=float32, shape=('batch_size', 'Concatclass_logits_dim_1', 2)],
     box_regression [dtype=float32, shape=('batch_size', 'Concatbox_regression_dim_1', 4)]}
    
    ---- 195 Initializer(s) ----
    
    ---- 626 Node(s) ---

Question

From what I have read, I have to create a custom parser bbox function and add parse-bbox-func-name to configuration file, however i cannot find resources how to create this. In this article Deploying to DeepStream for RetinaNet - NVIDIA Docs the references mostly redirect to main DeepStream documentation where is nothing written for custom bbox function. Do I have to create such a custom function for bounding box parsing and I cannot use the default one, if so could I write it in Python or it has to be written in C in some specific location? Could You point me to the right resource where I may find it?

It is usually recommended that you use C/cpp and add the following item to the configuration file

parse-bbox-func-name=NvDsInferParseCustomXXXXX
custom-lib-path=../../../post_processor/libnvds_xxxxxxxx.so

This is a native sample code

/opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp

If you must use Python, add a probe function to the src pad of GIE, and then add your custom parsing code in the probe function. Here is an example

@junshengy Thank You for Your answer. So previously I had written pad probe function that is connected to src pad of nvinfer element, and it looks like this:

def pgie_src_pad_buffer_probe(
    pad: Gst.Pad, info: Gst.PadProbeInfo, u_data: dict
) -> Gst.PadProbeReturn:
    """
    Buffer probe for inference
    """
    print("TEST")

    gst_buffer = info.get_buffer()
    if not gst_buffer:
        logger.error("Unable to get GstBuffer pad buffer probe")
        return Gst.PadProbeReturn.OK

    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        print(frame_meta)

        try:
            l_frame = l_frame.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK

In this case, the “TEST” message is not even being printed and I get the error that I pasted in the previous message. So in my case, even without adding the inference pad probe, the code does not work correctly and it gets error: “Failed to parse bboxes”.

When it comes to C/cpp code for custom bbox parsing, I cannot find such a file that You provided with a native sample code. In my sources there is no lib directory, I use DeepStream 7.1.

It seems that the TensorRT version does not match. For DS-7.1, it should be 10.3. Please use SDKManager to re-burn the system and DS-7.1

After correct installation using SDKManager, you should be able to find the relevant code, which is part of the SDK,The above code does not work, it may be due to this reason

1 Like

Ok, so i made a mistake when providing the version of TensorRT. I have DeepStream version 7.1 and TensorRT 10.3. I haven’t had it installed so I installed the lib directory in sources.

1 Like

Any luck on this? I am surprised there isn’t already a parser or custom-lib-path? For Deformable DETR through Nvidia TAO they already had these built in.

This user is missing some components when installing. If you have other questions, you can open a new topic.

Hi @tjwhitten25
I had to write my custom function in C++. I wrote it inside: sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp and then converted it to .so file.

Inside the nvinfer configuration file, I had to add: parse-bbox-func-name and custom-lib-path. I also added output-tensor-meta=1 which enables me to use tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data) inside my probe function in Python code.

1 Like

Awesome thanks, anyway you could share your nvdsinfer_custombboxparser.cpp please?

Sure @tjwhitten25, so the file nvdsinfer_custombboxparser.cpp is publicly available. If You have DeepStream installed in opt, this is for example my location of the file:
/opt/nvidia/deepstream/deepstream-7.1/sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp.

There is a Makefile inside the directory and the file uses some macros to check whether the function is written correctly. This is the function that I wrote, however I still experiment with it and there is some code hardcoded that I will later adjust:

...

extern "C" bool NvDsInferParseCustomRetinaNet(std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
                                                NvDsInferNetworkInfo const &networkInfo,
                                                NvDsInferParseDetectionParams const &detectionParams,
                                                std::vector<NvDsInferObjectDetectionInfo> &objectList)
{
    // Find output layers by names
    const NvDsInferLayerInfo *class_logits = nullptr;
    const NvDsInferLayerInfo *box_regression = nullptr;

    for (const auto &layer : outputLayersInfo)
    {
        if (strcmp(layer.layerName, "class_logits") == 0)
            class_logits = &layer;
        else if (strcmp(layer.layerName, "box_regression") == 0)
            box_regression = &layer;
    }

    if (!class_logits || !box_regression)
    {
        std::cerr << "ERROR: Required output layers not found." << std::endl;
        return false;
    }

    // Cast layer buffers to appropriate types. It holds the data of the layer
    const float *class_logits_data = static_cast<const float *>(class_logits->buffer);
    const float *box_regression_data = static_cast<const float *>(box_regression->buffer);

    // Iterate over detections and parse bounding boxes
    // inferDims.d[0] is total number of anchors predicted by the model
    // inferDims.d[1] indicates that each anchor predicts 4 values (typically x, y, width, height)
    // other dimensions are not used and are 0
    // Total number of anchors
    const int total_anchors = class_logits->inferDims.d[0];
    // Total number of classes
    const int total_classes = class_logits->inferDims.d[1];
    // Total number of coordinates per anchor
    const int total_coordinates = box_regression->inferDims.d[1];

    // Extract image width and height from networkInfo
    float imgWidth = networkInfo.width;   // 1080
    float imgHeight = networkInfo.height; // 1920

    for (int i = 0; i < total_anchors; i++)
    {
        // Find the class with the highest confidence
        float confidence = 1 / (1 + exp(-class_logits_data[i * total_classes + 1]));

        if (confidence < detectionParams.perClassThreshold[0])
            continue;

        // Decode the bounding box
        float x1 = box_regression_data[i * total_coordinates + 0];
        float y1 = box_regression_data[i * total_coordinates + 1];
        float x2 = box_regression_data[i * total_coordinates + 2];
        float y2 = box_regression_data[i * total_coordinates + 3];

        // // Convert to normalized center XYWH format
        // float x = (x1 + x2) / 2.0f;
        // float y = (y1 + y2) / 2.0f;
        float width = x2 - x1;
        float height = y2 - y1;

        // // Clip bounding box to frame dimensions
        width = std::max(0.0f, std::min(width, static_cast<float>(networkInfo.width)));
        height = std::max(0.0f, std::min(height, static_cast<float>(networkInfo.height)));
        
        // Hardcoded code, that needs to be changed
        NvDsInferObjectDetectionInfo objectInfo;
        objectInfo.classId = 100;
        objectInfo.detectionConfidence = confidence;
        objectInfo.left = 100;
        objectInfo.top = 100;
        objectInfo.width = 100;
        objectInfo.height = 100;
        objectList.push_back(objectInfo);
    }

    return true;
};

...

CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomRetinaNet);

If You have some suggestions, tips or questions feel free to write them down. I am still experimenting with this code.

@junshengy My code is working now, but I’m a bit confused about something. I wrote a custom bounding box parser function, and I’m wondering if this function acts like a probe function connected to the nvinfer plugin. Specifically, does the parse-bbox-func-name parameter work as a probe function for nvinfer, allowing me to handle all the post-processing for my inference within it (within C++ code) and probe function in Python is not needed anymore?

Additionally, I would like to access the values from std::vector<NvDsInferObjectDetectionInfo> &objectList in my Python code. However, I couldn’t find any examples or guidance on how to achieve this. Could you provide some advice or point me to an example?

The python probe function is still needed, but there is no need to parse the output tenosr.

You can refer to DetectPostprocessor::fillDetectionOutput in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp and attach_metadata_detector in /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp

parse-bbox-func-name is responsible for parsing the bbox from the output tensor, and then the nvinfe plugin will convert the bbox to object meta data, so you don’t need to access std::vector<NvDsInferObjectDetectionInfo> &objectList, just access obj_meta in the probe function of python pgie src pad , likeobj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)

Thank you for your response, @junshengy . I’m currently facing an issue while processing obj_meta inside the probe function in Python. Below is the code I am using:

def nvinfer_pad_buffer_probe(
    pad: Gst.Pad, info: Gst.PadProbeInfo, u_data: dict
) -> Gst.PadProbeReturn:
    logger = u_data["logger"]

    gst_buffer = info.get_buffer()
    if not gst_buffer:
        logger.exception(
            "Unable to get GstBuffer for nvinfer pad buffer probe"
        )
        return Gst.PadProbeReturn.OK

    # Get batch metadata from the GstBuffer
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list

    while l_frame:
        try:
            # Extract frame metadata
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)

            # Iterate through detected objects in the frame
            l_obj = frame_meta.obj_meta_list
            print(l_obj)
            while l_obj:
                try:
                    # Extract object metadata
                    obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)

                    print(obj_meta)

                except StopIteration:
                    break
                l_obj = l_obj.next
        except StopIteration:
            break
        l_frame = l_frame.next

    return Gst.PadProbeReturn.OK

The problem is that l_obj is always None when I print it, meaning the object metadata (obj_meta_list) isn’t being populated. However, when I print the data inside my custom bounding box parser function in C++, I consistently see reasonable values being processed.

custom_bboxparser.txt (2.7 KB)

I’ve also attached my C++ code for the custom bounding box parser below for reference. Could this issue be related to something I’m missing in my nvinfer configuration file or in the way I’m parsing the data—either in the C++ custom parser or in the Python probe function? Any guidance would be greatly appreciated.

Did you add the probe function in nvinfer’s src pad?

You cannot get metadata when adding it in the sink pad because it has not been generated yet.

So i have such a function and I always connect the inference to src pad:

def _add_inference_probe(
    self, element: Gst.Element, inference_pad_buffer_probe, u_data: dict
) -> None:
    """
    Adds an inference probe to the element.
    """
    inference_src_pad = element.get_static_pad("src")
    if not inference_src_pad:
        self.logger.error("Unable to get src pad of inference element")
        return

    inference_src_pad.add_probe(
        Gst.PadProbeType.BUFFER, inference_pad_buffer_probe, u_data
    )

What is the configuration file of pgie? How do you set the threshold? Can osd draw the bbox? Is it dropped because the bbox confidence is too low?

[class-attrs-all]
pre-cluster-threshold=0.2
topk=20
nms-iou-threshold=0.5

Try debugging this function

So this is my configuration file of pgie:

[property]
gpu-id=0
model-color-format=0 # 0=RGB, 1=BGR
# onnx-file=../../models/dropper_wire/simplified-model.onnx
model-engine-file=../../models/dropper_wire/simplified-model.onnx_b1_gpu0_fp32.engine
labelfile-path=../../models/dropper_wire/labels.txt
infer-dims=3;1920;1080
batch-size=1

# Network Properties
network-mode=0         # FP16 mode
network-type=0         # 0 for detector
num-detected-classes=2 # Set to the number of classes your detector recognizes
process-mode=1         # 1 for primary inference
gie-unique-id=1        # Unique ID for this inference model
output-blob-names=class_logits;box_regression
cluster-mode=2
interval=0
output-tensor-meta=1

# Post-processing settings
# If output-tensor-meta=1 processing can be done in Python
parse-bbox-func-name=NvDsInferParseCustomDropperWire
custom-lib-path=/opt/nvidia/deepstream/deepstream-7.1/sources/libs/nvdsinfer_customparser/libnvds_infercustomparser.so

[class-attrs-all]
topk=20
nms-iou-threshold=0.5
pre-cluster-threshold=0.1

I will check the osd whether it draws bboxes or not. Should I in this case have output-tensor-meta=1 or 0?

There is no need to set output-tensor-meta to 1

1 Like

@junshengy I attempted to use nvosd in my pipeline, but aside from displaying frames, I am not getting any results from the inference.

Could you clarify what you meant by:

You can refer to DetectPostprocessor::fillDetectionOutput in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp and attach_metadata_detector in /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp

Should I be using DetectPostprocessor::fillDetectionOutput in my custom C++ bounding box processing? If so, how should it be integrated?

Thanks for your guidance!

This means that your post-processing did not generate the corresponding ObjectList, so you cannot get the corresponding result in Python.

I mean you can add logging to debug your post-processing library. Since I can’t debug, I can’t tell if there is something wrong with your post-processing library.

I’m guessing this code isn’t getting the correct result

/* Call custom parsing function if specified otherwise use the one
     * written along with this implementation. */
    if (m_CustomBBoxParseFunc)
    {
        if (!m_CustomBBoxParseFunc(outputLayers, m_NetworkInfo,
                    m_DetectionParams, m_ObjectList))
        {
            printError("Failed to parse bboxes using custom parse function");
            return NVDSINFER_CUSTOM_LIB_FAILED;
        }
    }