Issue with Converting ONNX Model with different dimensions to TensorRT Engine for DeepStream

Environment

TensorRT Version: 8.5.2
GPU Type: Jetson Xavier NX
Nvidia Driver Version: jetpack version 5.1.4
CUDA Version: 11.5
Operating System + Version: Ubuntu -20.4
Python Version (if applicable): python -3.8

Subject: Issue with Converting ONNX Model to TensorRT Engine for DeepStream

Description:

Hi @junshengy ,

I have fine-tuned a YOLOv8 model on my custom dataset, and after fine-tuning, the input dimensions of the model are (1, 3, 544, 544) and the output dimensions are (1, 16, 6069).

Based on the required input/output dimensions for DeepStream, I modified the ONNX model to use dynamic input dimensions of (3, 640, 640) and dynamic output dimensions of (8400, 84) following the conversion steps.

Here is the code I used for modifying the model’s input and output dimensions:

import onnx_graphsurgeon as gs
import numpy as np
import onnx

Load your ONNX model

model_path = “yolov9-t-converted.onnx”
graph = gs.import_onnx(onnx.load(model_path))

Get the original input

original_input = graph.inputs[0]

Create new input variable with dynamic batch and target dimensions

new_input = gs.Variable(
name=original_input.name,
dtype=np.float32,
shape=(“dynamic”, 3, 640, 640) # DeepStream expects (-1,3,640,640)
)

Replace the original input with the new one

graph.inputs = [new_input]

Find the first node (usually the input node) and update its input

for node in graph.nodes:
if original_input.name in [inp.name for inp in node.inputs]:
for i, inp in enumerate(node.inputs):
if inp.name == original_input.name:
node.inputs[i] = new_input
break

Get the original output

original_output = graph.outputs[0]

Create new output variable with target dimensions

trans_out = gs.Variable(
name=“trans_out”,
dtype=np.float32,
shape=(“dynamic”, 8400, 84) # DeepStream expects (-1,8400,84)
)

Add transpose node to rearrange dimensions

trans_node = gs.Node(
op=“Transpose”,
name=“transpose_output_node”,
attrs={“perm”: np.array([0, 2, 1])}, # Swap last two dimensions
inputs=[original_output],
outputs=[trans_out]
)

Add the node to the graph and update outputs

graph.nodes.append(trans_node)
graph.outputs = [trans_out]

graph.cleanup(remove_unused_graph_inputs=True).toposort()

Run shape inference and save the modified model

model = onnx.shape_inference.infer_shapes(gs.export_onnx(graph))
output_path = “yolov9-t-converted-deepstream-ready.onnx”
onnx.save(model, output_path)

print(f"Model successfully converted and saved to {output_path}")

After converting the ONNX model, I tried to build the TensorRT engine using the DeepStream pipeline, but I encountered the following error:

Creating Pipeline

Creating Source

Creating Decoder

Is it Integrated GPU? : 1
Creating nv3dsink

Playing file %s
Adding elements to Pipeline

Linking elements in the Pipeline

Starting pipeline

0:00:00.454616627 10295 0x3eb38160 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: [TRT]: 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IShuffleLayer /model.22/dfl/Reshape: reshape changes volume. Reshaping [1,64,8400] to [1,4,16,6069].)
ERROR: Build engine failed from config file
ERROR: failed to build trt engine.
0:00:04.409356313 10295 0x3eb38160 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2022> [UID = 1]: build engine file failed
0:00:04.499672186 10295 0x3eb38160 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2108> [UID = 1]: build backend context failed
0:00:04.499755418 10295 0x3eb38160 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1282> [UID = 1]: generate backend failed, check config file settings
0:00:04.500395933 10295 0x3eb38160 WARN nvinfer gstnvinfer.cpp:898:gst_nvinfer_start: error: Failed to create NvDsInferContext instance
0:00:04.500447613 10295 0x3eb38160 WARN nvinfer gstnvinfer.cpp:898:gst_nvinfer_start: error: Config file path: /home/atmecs/Documents/ppe detection/model_configs/f1_model.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
Error: gst-resource-error-quark: Failed to create NvDsInferContext instance (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(898): gst_nvinfer_start (): /GstPipeline:pipeline0/GstNvInfer:primary-inference:
Config file path: /home/atmecs/Documents/ppe detection/model_configs/f1_model.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED

Request:

Could you assist in resolving this issue? Is there something wrong with my ONNX model conversion ?

You cannot modify the input/output dimensions without modifying the internal operators of the model.

For your model, add infer-dims=3;544;544 to the configuration file to determine the input layer.

For the output layer, the script provided by deepstream_tools only transposes the dimensions of the yolov8 output layer.

Please decide whether to transpose it according to your model and whether you need to modify the post-processing code.

Hello @junshengy,

I am facing an issue while converting an ONNX model to a TensorRT model on JetPack 5.1.4. During the conversion process, I see the following logs in the terminal:

[04/02/2025-17:23:18] [W] [TRT] Tactic Device request: 7151MB Available: 5868MB. Device memory is insufficient to use tactic.
[04/02/2025-17:23:18] [W] [TRT] Skipping tactic 3 due to insufficient memory on requested size of 7151 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[04/02/2025-17:23:18] [W] [TRT] Tactic Device request: 7151MB Available: 5871MB. Device memory is insufficient to use tactic.
[04/02/2025-17:23:18] [W] [TRT] Skipping tactic 8 due to insufficient memory on requested size of 7151 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[04/02/2025-17:23:19] [W] [TRT] Tactic Device request: 7151MB Available: 5870MB. Device memory is insufficient to use tactic.
[04/02/2025-17:23:19] [W] [TRT] Skipping tactic 13 due to insufficient memory on requested size of 7151 detected for tactic 0x0000000000000074.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().

The model conversion is successful, but I notice a significant performance degradation.

When I decrease the workspace size, the same logs continue to appear, and the performance degradation persists.

Interestingly, when I perform the same conversion on another device running JetPack version 6.0 (Jetson Orin), the conversion completes successfully without any performance degradation, and I do not see the above logs in the terminal.

Could anyone help me understand why this performance degradation occurs on JetPack 5.1.4 (Jetson Xavier and Jetson Orin), while JetPack version 6.0 (Jetson Orin) does not exhibit the same issue?

Additionally, are there any recommendations for overcoming this memory issue or performance degradation on JetPack 5.1.4?

Hello @junshengy,

Just to clarify, I made a mistake in above post. The JetPack version I’m using on the Jetson Orin device is JetPack 6.2 with TensorRT 10.3.0, where the model conversion works fine without performance issues .

This doesn’t seem to mean anything. You can’t compare Xavier/Orin like this. This is just the log of the trt model conversion.
For deepstream pipeline, is there a drop in fps on xavier? Xavier cannot upgrade Jetpack 6.x. How do you compare?

Hi @junshengy,

Thanks for your response.

I’m not trying to directly compare Xavier and Orin, but rather, I’m facing an issue where after converting my model to TensorRT, only the Safety Mask object is detected. When the model is in .pt format, it detects all the objects correctly. My question is focused on the TensorRT conversion process and why it’s affecting the object detection behavior.

To clarify:

We have two devices, Orin and Xavier NX:

  • When using Jetpack 6.2 on Orin, the model works as expected, detecting all objects, including:
    • Safety_Mask
    • Safety_Gloves
    • Wheel_Chock
    • Earthing_Clamp
    • Safety_Helmet
    • R_Jacket
    • People_Count
  • When downgraded to Jetpack 5.1.5 on Orin, only Safety Mask is detected.
  • On Xavier (Jetpack 5.1.5), the model detects only the Safety Mask object after conversion to TensorRT.

For context, my .pt model dimensions are different than the dimensions accepted by DeepStream, so I transpose the dimensions and convert the ONNX model to TensorRT. The following is the code I used for conversion:

from ultralytics import YOLO

# Load YOLO model
model_path = "/home/atmecs/Documents/ppe_detection/models/F1.pt"
model = YOLO(model_path)

# Convert to ONNX and explicitly save it with the desired name
model.export(format='onnx', imgsz=(640, 640), simplify=False)

print(f"Model has been successfully converted")
import onnx_graphsurgeon as gs
import numpy as np
import onnx

graph = gs.import_onnx(onnx.load("/home/atmecs/Documents/ppe detection/models/F1.onnx"))
# graph = gs.import_onnx(onnx.load("yolov8-s.onnx"))
ori_output = graph.outputs[0]
trans_out  = gs.Variable(name="trans_out", dtype=np.float32, shape=(-1, 8400, 84))
trans_node = gs.Node(op="Transpose",name="transpose_output_node", attrs={"perm":np.array([0,2,1])}, inputs=[ori_output], outputs=[trans_out])
graph.nodes.append(trans_node)
graph.outputs = [trans_out]
graph.cleanup(remove_unused_graph_inputs=True).toposort()
model = onnx.shape_inference.infer_shapes(gs.export_onnx(graph))
onnx.save(model, "F1-dynamic_batch_640.onnx")

I raised this issue to understand why this is happening and how I can resolve it to ensure that the model detects all objects, just like in the .pt format.

Could you please help me with the following:

  1. Why does the model detect only Safety Mask after conversion to TensorRT, even though it detects all objects in the .pt format?
  2. How can I modify the conversion process or model settings to ensure that all objects are detected after converting to TensorRT?

This may be caused by precision issues. What does your pipeline look like? Has it been scaled multiple times?

In post-processing, are other classes low confident? You can adjust this by adjusting the clustering parameters.

I followed the DeepStream documentation and used the recommended precision modes (FP16, FP32), but I’m not seeing any improvements in the detection results.

Clustering Methods:

I experimented with three clustering options:

DBSCAN

NMS

No Clustering

When no clustering was applied, multiple bounding boxes were detected, but the output only showed a “safety mask.”

Post-Processing:

I did not apply any post-processing in the pipeline. The issue persisted even without clustering or post-processing.

Configuration File of the Inference Engine:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
infer-dims=3;640;640
#onnx-file=/home/atmecs/Documents/ppe_detection/models/F1/F1-dynamic_batch_640.onnx

model-engine-file=/home/atmecs/Documents/ppe_detection/models/F1/F1-dynamic_batch_640_16.engine
#model-engine-file=/home/atmecs/Documents/ppe_detection/models/F1/F1-dynamic_batch_640.onnx_b1_gpu0_fp32.engine
#model-engine-file=/home/atmecs/Documents/ppe_detection/models/F1/F1-dynamic_batch_640_best.engine

#int8-calib-file=calib.table
labelfile-path=/home/atmecs/Documents/ppe_detection/labels/f1_labels.txt
batch-size=1
#network-mode=0 #int32 precision
#network-mode= 3 # best precision
network-mode= 2 #int16 precision
num-detected-classes=6
interval=0
gie-unique-id=1
process-mode=1
network-type=0
## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=1
maintain-aspect-ratio=1

symmetric-padding=1

custom-lib-path = /home/atmecs/Documents/ppe_detection/Parsers/f1_parsers/f1_model-parser.so
parse-bbox-func-name=NvDsInferParseCustomYoloV8

[class-attrs-all]
nms-iou-threshold=0.65
pre-cluster-threshold=0.45
topk=100

Pipeline Code:

import sys
sys.path.append('../')
import os
import gi
gi.require_version('Gst', '1.0')
from gi.repository import GLib, Gst
from platform_info import PlatformInfo
from bus_call import bus_call

import pyds

MUXER_BATCH_TIMEOUT_USEC = 33000

Safety_Mask = 0
Safety_Gloves = 1
Wheel_Chock = 2
Earthing_Clamp = 3
Safety_Helmet = 4
R_Jacket = 5
People_Count = 6

def pgie_src_pad_buffer_probe(pad, info, u_data):
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer")
        return Gst.PadProbeReturn.OK
        
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    
    print("---- PGIE Output Debug ----")
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
            
        print("Frame {frame_meta.frame_num}: ", {frame_meta.num_obj_meta})
        
        # Print details of each detected object
        l_obj = frame_meta.obj_meta_list
        while l_obj is not None:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
                print(f"Class ID: {obj_meta.class_id}, Confidence: {obj_meta.confidence}")
                print(f"Bbox- left: x={obj_meta.rect_params.left}, top :y={obj_meta.rect_params.top}, w={obj_meta.rect_params.width}, h={obj_meta.rect_params.height}")
                l_obj = l_obj.next
            except StopIteration:
                break
                
        try:
            l_frame = l_frame.next
        except StopIteration:
            break
    print("-------------------------")
            
    return Gst.PadProbeReturn.OK

def pad_added_handler(src, new_pad, data):
    print("Received new pad '{}'".format(new_pad.get_name()))
    sink_pad = data.get_static_pad("sink_0")
    if not sink_pad.is_linked():
        print("Linking decoder to streammux")
        new_pad.link(sink_pad)

def osd_sink_pad_buffer_probe(pad, info, u_data):
    frame_number = 0
    num_rects = 0

    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return

    # Retrieve batch metadata from the gst_buffer
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        # Initialize object counter
        obj_counter = {
            Safety_Mask: 0,
            Safety_Gloves: 1,
            Wheel_Chock: 2,
            Earthing_Clamp: 3,
            Safety_Helmet: 4,
            R_Jacket: 5,
            People_Count: 6
        }
        frame_number = frame_meta.frame_num
        num_rects = frame_meta.num_obj_meta
        l_obj = frame_meta.obj_meta_list
        while l_obj is not None:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break
            obj_counter[obj_meta.class_id] += 1
            obj_meta.rect_params.border_color.set(0.0, 0.0, 1.0, 0.8)  # Blue border (opacity 0.8)
            try: 
                l_obj = l_obj.next
            except StopIteration:
                break

        # Display meta setup
        display_meta = pyds.nvds_acquire_display_meta_from_pool(batch_meta)
        display_meta.num_labels = 1
        py_nvosd_text_params = display_meta.text_params[0]
        py_nvosd_text_params.display_text = "Frame Number={} Number of Objects={} Safety_Mask={} Safety_Gloves={} Wheel_Chock={} Earthing_Clamp={} Safety_Helmet={} R_Jacket={} People_Count={}".format(
            frame_number, num_rects, 
            obj_counter[Safety_Mask],
            obj_counter[Safety_Gloves],
            obj_counter[Wheel_Chock],
            obj_counter[Earthing_Clamp],
            obj_counter[Safety_Helmet],
            obj_counter[R_Jacket],
            obj_counter[People_Count]
        )

        py_nvosd_text_params.x_offset = 10
        py_nvosd_text_params.y_offset = 12

        py_nvosd_text_params.font_params.font_name = "Serif"
        py_nvosd_text_params.font_params.font_size = 10
        py_nvosd_text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)  # White text
        py_nvosd_text_params.set_bg_clr = 1
        py_nvosd_text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0)  # Black background

        print(pyds.get_string(py_nvosd_text_params.display_text))
        pyds.nvds_add_display_meta_to_frame(frame_meta, display_meta)
        try:
            l_frame = l_frame.next
        except StopIteration:
            break
            
    return Gst.PadProbeReturn.OK

def main(args):
    platform_info = PlatformInfo()
    Gst.init(None)

    pipeline = Gst.Pipeline()

    if not pipeline:
        sys.stderr.write("Unable to create Pipeline\n")

    # Source element for reading from the file
    source = Gst.ElementFactory.make("filesrc", "file-source")
    if not source:
        sys.stderr.write("Unable to create Source\n")

    # Decoder
    decoder = Gst.ElementFactory.make("decodebin", "decode-bin")
    if not decoder:
        sys.stderr.write("Unable to create Decoder\n")

    # Streammux for batching sources
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
    if not streammux:
        sys.stderr.write("Unable to create NvStreamMux\n")

    # Primary Inference Element
    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
    if not pgie:
        sys.stderr.write("Unable to create pgie\n")

    # Video converter
    nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
    if not nvvidconv:
        sys.stderr.write("Unable to create nvvidconv\n")

    # Onscreen display
    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
    if not nvosd:
        sys.stderr.write("Unable to create nvosd\n")

    # Sink element (render the output)
    sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
    if not sink:
        sys.stderr.write("Unable to create EGL Sink\n")

    source.set_property('location', '/home/atmecs/Documents/ppe_detection/inputs/IMG_6880.MOV')
    
    # Set properties for the streammux
    streammux.set_property('batch-size', 1)

    pgie.set_property('config-file-path', "/home/atmecs/Documents/ppe_detection/model_configs/f1_model.txt")

    # Add elements to pipeline
    pipeline.add(source)
    pipeline.add(decoder)
    pipeline.add(streammux)
    pipeline.add(pgie)
    pipeline.add(nvvidconv)
    pipeline.add(nvosd)
    pipeline.add(sink)
    
    # Link elements
    source.link(decoder)
    decoder.connect("pad-added", pad_added_handler, streammux)
    streammux.link(pgie)
    pgie.link(nvvidconv)
    nvvidconv.link(nvosd)
    nvosd.link(sink)
    
    # Start the pipeline
    pipeline.set_state(Gst.State.PLAYING)
    
    # Bus to handle messages
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, pipeline)

    GLib.MainLoop().run()

if __name__ == "__main__":
    sys.exit(main(sys.argv))

Main Issue: Despite using the correct model and configuration, only the “safety mask” object is detected, even though other objects are present in the scene.

Help me debug this problem? What could be causing this issue, and how can I improve the object detection results?

Now on Orin,difference Jetpack version 5.1.4 vs 6.2 produce difference result.

Some other attempts, nvstreammux uses gpu scaling, why don’t you set width/height property? This shouldn’t work, The value of interpolation-method property may also cause some impact, try to adjust it.

streammux.set_property('compute-hw', 1)
streammux.set_property('width', "your_video_width")
streammux.set_property('height', "your_video_height")

If the above attempt does not work, I think this issue caused by TRT version(8.6 vs 10.3).

Can you verify this with a demo that only uses the TRT API?

If so, I think this issue can only be discussed in the TRT issues list, and deepstream can’t do anything.

@junshengy, thank you. I will reach out to TensorRT, as you suggested.

Coming to the Deepstream pipeline , When I run the pipeline, I observe a latency between the OSD display and the processing. Specifically, after processing four frames, I can see the processing logs in the terminal for those four frames, and only then is the output displayed.

import gi
gi.require_version('Gst', '1.0')
from gi.repository import Gst, GObject, GLib
import pyds
from pathlib import Path
from probes import *
import toml
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from bus_call import bus_call
from platform_info import PlatformInfo
from parse_cfg import *

MUXER_BATCH_TIMEOUT_USEC = 33000

def rtsp_pad_added(src, new_pad, data):
    """Handle pad-added signal from rtspsrc"""
    rtph264depay = data
    
    caps = new_pad.get_current_caps()
    caps_str = caps.to_string() if caps else ""
    print(f"RTSP Source pad added: {new_pad.get_name()} with caps: {caps_str}")
    
    if "application/x-rtp" in caps_str:
        sink_pad = rtph264depay.get_static_pad("sink")
        if sink_pad and not sink_pad.is_linked():
            ret = new_pad.link(sink_pad)
            if ret == Gst.PadLinkReturn.OK:
                print(f"Successfully linked {src.get_name()} -> {rtph264depay.get_name()}")
            else:
                print(f"Link failed: {ret}")
        else:
            print(f"Depay sink pad already linked or unavailable")
    else:
        print(f"Ignoring non-RTP pad: {new_pad.get_name()}")

def decoder_pad_added(src, new_pad, data):
    """Handle pad-added signal from decoder"""
    streammux = data[0]
    sink_idx = data[1]
    
    print(f"Decoder {src.get_name()} got new pad: {new_pad.get_name()}")
    
    # Request sink pad from streammux
    sink_pad_name = f"sink_{sink_idx}"
    sink_pad = streammux.get_request_pad(sink_pad_name)
    if not sink_pad:
        sys.stderr.write(f"Unable to get request pad '{sink_pad_name}' from streammux\n")
        return
    
    if sink_pad.is_linked():
        print(f"Streammux sink pad '{sink_pad_name}' is already linked.")
    else:
        ret = new_pad.link(sink_pad)
        if ret == Gst.PadLinkReturn.OK:
            print(f"Decoder pad linked to streammux sink pad '{sink_pad_name}'.")
        else:
            sys.stderr.write(f"Failed to link decoder pad to streammux. Return: {ret}\n")



def create_sources(pipeline, camera_config, streammux):
    """Create RTSP sources from config"""
    sources = []
    
    for i, (cam_name, cam_url) in enumerate(camera_config.items()):
        print(f"Setting up camera {cam_name} with URL: {cam_url}")
        
        # Create elements
        source = Gst.ElementFactory.make("rtspsrc", f"src-{i}")
        rtph264depay = Gst.ElementFactory.make("rtph265depay", f"depay-{i}")
        h264parse = Gst.ElementFactory.make("h265parse", f"parse-{i}")
        decoder = Gst.ElementFactory.make("nvv4l2decoder", f"dec-{i}")
        
        # Configure source
        source.set_property("location", cam_url)
        source.set_property("latency", 200)  # ms buffer
        
        # Add elements to pipeline
        pipeline.add(source)
        pipeline.add(rtph264depay)
        pipeline.add(h264parse)
        pipeline.add(decoder)
        
        # Link static elements
        rtph264depay.link(h264parse)
        h264parse.link(decoder)
        
        # Use the existing rtsp_pad_added function instead of defining a new callback
        source.connect("pad-added", rtsp_pad_added, rtph264depay)
        
        # Add to sources list with the camera name
        sources.append((cam_name, decoder))
    
    return sources


def create_pipeline(camera_config, pgie1_config_path, pgie2_config_path):
    """Create the full DeepStream pipeline"""
    Gst.init(None)
    pipeline = Gst.Pipeline()
    platform_info = PlatformInfo()
    
    if not pipeline:
        sys.stderr.write("Unable to create Pipeline\n")
        return None
    
    # Create streammux first
    streammux = Gst.ElementFactory.make("nvstreammux", "stream-muxer")
    if not streammux:
        sys.stderr.write("Unable to create nvstreammux\n")
        return None
    
    streammux.set_property("batch-size", len(camera_config))
    streammux.set_property("width", 1920)
    streammux.set_property("height", 1080)
    streammux.set_property("batched-push-timeout", MUXER_BATCH_TIMEOUT_USEC)
    
    pipeline.add(streammux)
    
    # Create sources
    sources = create_sources(pipeline, camera_config, streammux)
    if not sources:
        sys.stderr.write("Failed to create sources\n")
        return None
    # # Create streammux for batch processing
    # streammux = Gst.ElementFactory.make("nvstreammux", "stream-muxer")
    # if not streammux:
    #     sys.stderr.write("Unable to create nvstreammux\n")
    #     return None
    
    # Configure streammux
    # streammux.set_property("batch-size", len(sources))
    # streammux.set_property("width", 1920)
    # streammux.set_property("height", 1080)
    # streammux.set_property("batched-push-timeout", MUXER_BATCH_TIMEOUT_USEC)
    
    # Add streammux to pipeline
    #pipeline.add(streammux)
    
    # Connect sources to streammux
    for i, (cam_name, decoder) in enumerate(sources):
        sinkpad = streammux.get_request_pad(f"sink_{i}")
        if not sinkpad:
            sys.stderr.write(f"Unable to get sink pad from streammux for {cam_name}\n")
            return None
            
        srcpad = decoder.get_static_pad("src")
        if not srcpad:
            sys.stderr.write(f"Unable to get src pad from decoder for {cam_name}\n")
            return None
            
        srcpad.link(sinkpad)
    
    # Create first primary inference engine (PGIE)
    pgie1 = Gst.ElementFactory.make("nvinfer", "primary-inference-1")
    if not pgie1:
        sys.stderr.write("Unable to create primary inference engine 1\n")
        return None
    
    # Create second primary inference engine (PGIE)
    pgie2 = Gst.ElementFactory.make("nvinfer", "primary-inference-2")
    if not pgie2:
        sys.stderr.write("Unable to create primary inference engine 2\n")
        return None
    
    # Configure PGIE1 and PGIE2
    pgie1.set_property("config-file-path", pgie1_config_path)
    pgie2.set_property("config-file-path", pgie2_config_path)
    
    # Create converter for color space conversion
    nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "converter")
    if not nvvidconv:
        sys.stderr.write("Unable to create nvvideoconvert\n")
        return None
    
    # Create OSD for display
    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreen-display")
    if not nvosd:
        sys.stderr.write("Unable to create nvdsosd\n")
        return None
    
    # Create appropriate sink based on platform
    # Platform-specific sink creation
    if platform_info.is_integrated_gpu():
        print("Creating nv3dsink \n")
        sink = Gst.ElementFactory.make("nv3dsink", "nv3d-sink")
        if not sink:
            sys.stderr.write(" Unable to create nv3dsink \n")
            return None
    else:
        if platform_info.is_platform_aarch64():
            print("Creating nv3dsink \n")
            sink = Gst.ElementFactory.make("nv3dsink", "nv3d-sink")
        else:
            print("Creating EGLSink \n")
            sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
        if not sink:
            sys.stderr.write(" Unable to create egl sink \n")
            return None
    
    # Add all elements to pipeline
    pipeline.add(pgie1)
    pipeline.add(pgie2)
    pipeline.add(nvvidconv)
    pipeline.add(nvosd)
    pipeline.add(sink)
    
    # Link all elements in the pipeline
    print("Linking elements in the pipeline")
    streammux.link(pgie1)
    pgie1.link(pgie2)
    pgie2.link(nvvidconv)
    nvvidconv.link(nvosd)
    nvosd.link(sink)
    
    # Add probes for monitoring and debugging
    pgie1_src_pad = pgie1.get_static_pad("src")
    if pgie1_src_pad:
        pgie1_src_pad.add_probe(Gst.PadProbeType.BUFFER, pgie1_probe, "PGIE1")
    
    pgie2_src_pad = pgie2.get_static_pad("src")
    if pgie2_src_pad:
        pgie2_src_pad.add_probe(Gst.PadProbeType.BUFFER, pgie2_probe, "PGIE2")
    
    osd_sink_pad = nvosd.get_static_pad("sink")
    if osd_sink_pad:
        osd_sink_pad.add_probe(Gst.PadProbeType.BUFFER, osd_probe, 0)
    
    return pipeline

def main(cfg):
    # Create pipeline
    print(cfg)
    camera_config = cfg ['source']

    pgie1_config_path = cfg["pgie"]["goggles_config"]
    pgie2_config_path = cfg["pgie"]["full_ppe_config"]
    
    pipeline = create_pipeline(camera_config,pgie1_config_path,pgie2_config_path)
    
    # Setup bus and loop
    loop = GLib.MainLoop()
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, loop)
    
    # Start pipeline
    pipeline.set_state(Gst.State.PLAYING)
    
    try:
        loop.run()
    except:
        pass
    
    # Cleanup
    pipeline.set_state(Gst.State.NULL)
    gate_controller.cleanup()

if __name__ == "__main__":
    cfg=parse_args(cfg_path="paths.toml")
    main(cfg)

osd probe connected :


def osd_probe(pad, info, u_data):
    """Handles on-screen display and visual alerts"""
    buf = info.get_buffer()
    if not buf:
        return Gst.PadProbeReturn.OK
    
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(buf))
    
    # Initialize counters for all classes
    # PGIE1 classes (0-4)
    # PGIE2 classes (5-11)
    obj_counter = {class_id: 0 for class_id in range(12)}  # Classes 0-11
    
    # Class labels for display
    pgie1_labels = {
        0: "Safety Goggle",
        1: "ToeGuard",
        2: "Non Safety Shoes",
        3: "Safety Shoes",
        4: "Non Safety Goggles"
    }
    
    pgie2_labels = {
        5: "Safety Mask",
        6: "Safety Gloves",
        7: "Wheel Chock",
        8: "Earthing Clamp",
        9: "Safety Helmet",
        10: "R Jacket",
        11: "People Count"
    }
    
    # Use the same GList iteration approach that works in other functions
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
            camera_id = frame_meta.source_id
            
            # Reset counters
            for class_id in obj_counter:
                obj_counter[class_id] = 0
                
            # Count all objects
            l_obj = frame_meta.obj_meta_list
            while l_obj:
                try:
                    obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
                    obj_counter[obj_meta.class_id] += 1
                    
                    # Color coding
                    if obj_meta.class_id in [0, 1, 3]:  # Gate-relevant
                        obj_meta.rect_params.border_color.set(0.0, 1.0, 0.0, 0.8)  # Green
                    elif obj_meta.class_id in [5, 6, 9, 10]:  # Safety-critical PPE
                        obj_meta.rect_params.border_color.set(1.0, 0.0, 0.0, 0.8)  # Red
                    else:
                        obj_meta.rect_params.border_color.set(0.0, 0.0, 1.0, 0.8)  # Blue
                        
                    l_obj = l_obj.next
                except StopIteration:
                    break
            
            # Check violations
            missing_ppe = check_ppe_violations(obj_counter)
            people_count = obj_counter[11]
            has_violation = (people_count != 3) or missing_ppe
            
            # Visual warnings
            if has_violation:
                warning_meta = pyds.nvds_acquire_display_meta_from_pool(batch_meta)
                warning_meta.num_labels = 1
                warning_text = []
                
                if people_count != 3:
                    warning_text.append(f"PEOPLE: {people_count} (Expected 3)")
                if missing_ppe:
                    warning_text.append(f"MISSING: {', '.join(missing_ppe)}")
                    
                warning_params = warning_meta.text_params[0]
                warning_params.display_text = "\n".join(warning_text)
                warning_params.x_offset = 10
                warning_params.y_offset = frame_meta.source_frame_height - 100
                warning_params.font_params.font_color.set(1.0, 0.0, 0.0, 1.0)
                warning_params.text_bg_clr.set(0.0, 0.0, 0.0, 0.5)
                pyds.nvds_add_display_meta_to_frame(frame_meta, warning_meta)
            
            # Main OSD
            display_meta = pyds.nvds_acquire_display_meta_from_pool(batch_meta)
            display_meta.num_labels = 1
            text_params = display_meta.text_params[0]
            
            # Create OSD text with proper labels
            osd_text = [f"Camera {camera_id} | Frame: {frame_meta.frame_num}"]
            
            # PGIE1 detections
            osd_text.append("--- PGIE1 Detections ---")
            for class_id in range(5):  # PGIE1 classes (0-4)
                if obj_counter[class_id] > 0:
                    osd_text.append(f"{pgie1_labels[class_id]}: {obj_counter[class_id]}")
            
            # PGIE2 detections
            osd_text.append("--- PGIE2 Detections ---")
            for class_id in range(5, 12):  # PGIE2 classes (5-11)
                if obj_counter[class_id] > 0:
                    osd_text.append(f"{pgie2_labels[class_id]}: {obj_counter[class_id]}")
            
            text_params.display_text = "\n".join(osd_text)
            
            # Text formatting
            text_params.x_offset = 10
            text_params.y_offset = 10
            text_params.font_params.font_name = "Serif"
            text_params.font_params.font_size = 10
            text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)
            text_params.set_bg_clr = 1
            text_params.text_bg_clr.set(0.0, 0.0, 0.0, 0.5)
            
            # Gate status
            gate_open = (gate_controller.goggles_detected and 
                       (gate_controller.safety_shoes_detected or gate_controller.toeguard_detected))
            
            if display_meta.num_labels < 2:
                display_meta.num_labels = 2
                gate_params = display_meta.text_params[1]
                gate_params.display_text = f"GATE: {'OPEN' if gate_open else 'CLOSED'}"
                gate_params.x_offset = frame_meta.source_frame_width - 200
                gate_params.y_offset = 10
                
                # Fix the color setting
                if gate_open:
                    gate_params.font_params.font_color.set(0.0, 1.0, 0.0, 1.0)  # Green when open
                else:
                    gate_params.font_params.font_color.set(1.0, 0.0, 0.0, 1.0)  # Red when closed
                
                gate_params.text_bg_clr.set(0.0, 0.0, 0.0, 0.5)
            
            pyds.nvds_add_display_meta_to_frame(frame_meta, display_meta)
            
            l_frame = l_frame.next
        except StopIteration:
            break
    
    return Gst.PadProbeReturn.OK

Why is this happening? How can I make the output display immediately after processing?

Because, it is not acceptable to detect the object after a 1-2 seconds delay while it is standing in front of the camera.

This is usually not related to OSD, but is usually caused by RTSP camera. You also need to check the timing of nvinfer and the GPU load.

You can use local file to measure latency. Refer to this FAQ

The latency of rtsp camera is usually 1-2s. Lower latency requires smaller I frame intervals and UDP transmission, which requires high network quality, otherwise it will cause packet loss, screen distortion and other problems.

In addition, if you only use gst-play-1.0 to play the rtsp stream, what is the latency?

Try setting the value of the live-source property of nvstreammux to true.

We have discussed too many topics in this topic. Please open a new topic for other questions.

Hi @junshengy,

Before considering the issue with the TensorRT model, I thought to test whether the ONNX model is detecting all the objects correctly or not. For that, I’m trying to load the ONNX model and perform the detections. However, I encountered the following issue:

atmecs@atmecs-desktop:~/Documents/ppe_detection/Code$ python3 onnx_model_graph.py 
2025-04-10 18:12:54.732012726 [E:onnxruntime:Default, env.cc:234 ThreadMain] pthread_setaffinity_np failed for thread: 9703, index: 0, mask: {4, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-04-10 18:12:54.982097706 [W:onnxruntime:, graph.cc:109 MergeShapeInfo] Error merging shape info for output. 'trans_out' source:{1,8400,16} target:{1,8400,84}. Falling back to lenient merge.
/opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_vector.h:1123: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = unsigned int; _Alloc = std::allocator<unsigned int>; reference = unsigned int&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
Aborted (core dumped)

Issue Overview:

  • The ONNX model produces 16 channels, but the original model expects 84. This discrepancy occurred because of the transpose I applied to the output layer.

Attempted Solution: I tried to check the input and output shapes using the following code:

import onnxruntime
import numpy as np

session = onnxruntime.InferenceSession("/home/atmecs/Documents/ppe_detection/models/F1/F1-dynamic_batch_640.onnx")

dummy_input = np.random.rand(1, 3, 640, 640).astype(np.float32)

outputs = session.run(None, {"images": dummy_input})

for i, output in enumerate(outputs):
    print(f"Output {i} shape: {output.shape}")

However, I encountered the same issue as mentioned above.

The code I used for transposing :

import onnx_graphsurgeon as gs
import numpy as np
import onnx

graph = gs.import_onnx(onnx.load("yolov9-t-converted.onnx"))
# graph = gs.import_onnx(onnx.load("yolov8-s.onnx"))
ori_output = graph.outputs[0]
trans_out  = gs.Variable(name="trans_out", dtype=np.float32, shape=(-1, 8400, 84))
trans_node = gs.Node(op="Transpose",name="transpose_output_node", attrs={"perm":np.array([0,2,1])}, inputs=[ori_output], outputs=[trans_out])
graph.nodes.append(trans_node)
graph.outputs = [trans_out]
graph.cleanup(remove_unused_graph_inputs=True).toposort()
model = onnx.shape_inference.infer_shapes(gs.export_onnx(graph))
onnx.save(model, "yolov9-t-converted-trans-dynamic_batch_640.onnx")

Problem:

  • If I don’t transpose the layer, the DeepStream parser does not work. So, I’m wondering how to solve this issue.
  • I also suspect that this is the reason the TensorRT model is detecting only one object instead of all the trained objects.

Questions:

  1. How can I handle this discrepancy between the ONNX model output and the expected output without breaking compatibility with the DeepStream parser?
  2. Is the issue with the TensorRT model also related to this transpose issue, and could it explain why only one object is detected instead of all the trained objects?

Please modify the transposition script and post-processing according to your own model.Deepstream can’t do anything about this.

You can also use unmodified onnx for verification. Generally speaking, convert from pytorch pt model to onnx model will not affect the output.Otherwise, JP-6.1/DS-7.1 will not give correct results.

Why not just export with dynamic=True in Ultralytics?

Hi @Y-T-G,

Thanks for the response. Our .pt model is dynamic, and I’ve already tried exporting the model with dynamic=True during export. However, when we convert the model to TensorRT, it still detects only one object class—the first one.

Do you have any recommendations for correctly converting the model to resolve this issue?

You just export to ONNX with dynamic=True and then use trtexect to convert to TensorRT while providing correct min, opt and max shapes.

Dynamic batch usually does not affect the output. Please modify the transposition script according to the model output, then use a separate tensorrt program to find where is the issue.

Hi @junshengy , @Y-T-G ,

I have considered both of your inputs and converted the ONNX model to a dynamic model. I also transposed the model’s output — the original output shape was (1, 16, 8400) and I changed it to (1, 8400, 16) to match the expected format.

I ran inference outside DeepStream, and it worked well — all objects were being detected correctly. However, the labels were incorrect, which might be due to the label order I provided.

So, I took the working model and integrated it into DeepStream, but when running it there, we are facing the same issue again — only one object, safety_mask, is being detected.

Let me share my config file :


[property]

gpu-id=0

net-scale-factor=0.0039215697906911373

model-color-format=0

infer-dims=3;640;640

#onnx-file= f1.onnx

model-engine-file=f1.engine

#int8-calib-file=calib.table

labelfile-path=f1_labels.txt

batch-size=1

network-mode=0

num-detected-classes= 6

interval=0

gie-unique-id=1

process-mode=1

network-type=0

## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)

cluster-mode=2

maintain-aspect-ratio=1

symmetric-padding=1

custom-lib-path =f1_model-parser.so

parse-bbox-func-name=NvDsInferParseCustomYoloV8

[class-attrs-all]

nms-iou-threshold=0.65

pre-cluster-threshold=0.25

topk=100

Required your guidance on resolving this issue.

The model performs well during standalone inference outside of DeepStream, but when integrated into the DeepStream pipeline, it fails to work as expected. Could you help me understand why this discrepancy is occurring?

I didn’t specify any opset version explicitly while converting to ONNX — could this be the cause of the issue?

Are you using TensorRT for testing? If not, please follow my previous tips and use trt to test, onnxruntime/pytorch may be running on the CPU.

When comparing Deepstream with other inference frameworks, did you use the same preprocessing?

Also, are you doing model conversion and inference on Xavier JP5.1 and DS6.3?

First question:
Yes, I am using the TensorRT model for testing.

Second question:
Yes, I am using the same preprocessing. The preprocessing applied outside of DeepStream is as follows:

resized = cv2.resize(frame, (640, 640))
image = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
image = image.astype(np.float32) / 255.0
image = np.transpose(image, (2, 0, 1))
image = np.expand_dims(image, axis=0)
input_tensor = image.astype(np.float32)

Inside DeepStream, I have already provided the configuration file to handle the processing.

Third question:
Yes, I am performing model conversion on the same DeepStream 6.3 and Jetson device. After the conversion, I test the model’s performance outside of DeepStream to confirm its accuracy and performance. Only after verifying its performance do we integrate the model into DeepStream.All the testing, conversions, and DeepStream pipeline work are performed on the same device.