DeepStream outputs different segmentation mask shape than ONNX model has itself

• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 6.3
• JetPack Version (valid for Jetson only) : 5
• TensorRT Version : 8.5.2
• Issue Type( questions, new requirements, bugs) : questions

I have an ONNX model that outputs 224x224x1 mask in segmentation task. Here is the screen from Netron:
Screenshot 2024-07-19 at 12.16.01

During the execution of the Python DeepStream code, I display the video on the screen and print the mask shape in the terminal. The mask shape is incorrectly returned as 224x1 instead of 224x224x1. Due to that I cannot display the mask on the display. What could be causing this issue? Unfortunately, I cannot provide the model itself. I tried to run the python example of deepstream-segmentation and the mask shape there has 512x512 shape.

Here is the output from terminal:
a

Here is the configuration file:

[property]
gpu-id=0
# 0=RGB, 1=BGR
model-color-format=0
onnx-file=my_onnx_model.onnx
model-engine-file=my_engine_model.engine
infer-dims=3;224;224
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
num-detected-classes=1
interval=0
gie-unique-id=1
network-type=2
segmentation-threshold=0.3

Here is my Python code:

def osd_sink_pad_buffer_probe(pad, info, u_data):
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return

    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        frame_num = frame_meta.frame_num
        print("Frame Number:", frame_num)
        l_user = frame_meta.frame_user_meta_list
        while l_user is not None:
            try:
                seg_user_meta = pyds.NvDsUserMeta.cast(l_user.data)
            except StopIteration:
                break

            if (
                seg_user_meta
                and seg_user_meta.base_meta.meta_type
                == pyds.NVDSINFER_SEGMENTATION_META
            ):
                try:
                    # Note that seg_user_meta.user_meta_data needs a cast to pyds.NvDsInferSegmentationMeta
                    segmeta = pyds.NvDsInferSegmentationMeta.cast(
                        seg_user_meta.user_meta_data
                    )
                except StopIteration:
                    break

                # Retrieve mask data in the numpy format from segmeta
                masks = pyds.get_segmentation_masks(segmeta)
                masks = np.array(masks, copy=True, order="C")
                print("masks shape:", masks.shape)

            try:
                l_user = l_user.next
            except StopIteration:
                break

        try:
            l_frame = l_frame.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK

def main(args):
    # Check input arguments
    if len(args) < 2:
        sys.stderr.write("usage: %s <h264_elementary_stream>\n" % args[0])
        sys.exit(1)

    # GStreamer initialization
    Gst.init(None)

    ## CREATE GSTREAMER ELEMENTS
    # Pipeline element that will form a connection of other elements
    print("Creating Pipeline\n")
    pipeline = Gst.Pipeline()
    if not pipeline:
        sys.stderr.write("Unable to create Pipeline")

    # Source elemment for reading from the file
    print("Creating Source\n")
    source = Gst.ElementFactory.make("filesrc", "file-source")
    if not source:
        sys.stderr.write("Unable to create Source")

    # Data fromat is elementaty h264 steam, we need h264parser
    print("Creating H264Parser\n")
    h264parser = Gst.ElementFactory.make("h264parse", "h264-parser")
    if not h264parser:
        sys.stderr.write("Unable to create H264Parser")

    # use nvdec for hardware accelerated decode on GPU
    print("Creating Decoder\n")
    decoder = Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")
    if not decoder:
        sys.stderr.write("Unable to create Nvv4l2 Decoder")

    # Create nvstreammux instance to form batches from one or more sources.
    print("Creating Streammux\n")
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
    if not streammux:
        sys.stderr.write(" Unable to create NvStreamMux \n")

    # Create segmentation for primary inference
    print("Creating Segmentation\n")
    seg = Gst.ElementFactory.make("nvinfer", "primary-inference")
    if not seg:
        sys.stderr.write("Unable to create primary inferene\n")

    # Create nvvideoconvert to convert from NV12 to RGBA as required by nvosd
    print("Creatting nvvideoconvert\n")
    nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "nvvideconvert")
    if not nvvidconv:
        sys.stderr.write(" Unable to create nvvideoconvert \n")

    # Create OSD to draw on the converted RGBA buffer
    print("Creating OSD")
    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
    if not nvosd:
        sys.stderr.write(" Unable to create nvosd \n")

    # Create nv3dsink for output
    print("Creating nv3dsink \n")
    sink = Gst.ElementFactory.make("nv3dsink", "nv3d-sink")
    if not sink:
        sys.stderr.write("Unable to create nv3dsink \n")

    print("Playing file %s " % args[1])
    source.set_property("location", args[1])
    streammux.set_property("width", 1920)
    streammux.set_property("height", 1080)
    streammux.set_property("batch-size", 1)
    streammux.set_property("batched-push-timeout", 4000000)
    seg.set_property("config-file-path", "config_infer_primary.txt")

    # Set sync = false to avoid late frame drops at the display-sink
    sink.set_property("sync", False)

    print("Adding element to Pipeline\n")
    pipeline.add(source)
    pipeline.add(h264parser)
    pipeline.add(decoder)
    pipeline.add(streammux)
    pipeline.add(seg)
    pipeline.add(nvvidconv)
    pipeline.add(nvosd)
    # pipeline.add(nvsegvisual)
    pipeline.add(sink)

    # link elements together
    # file-source -> h264-parser -> nvv4l2-decoder -> nvinfer -> nvvideconvert -> nvosd -> nvsegvisual -> sink
    print("Linking elements in the Pipeline\n")
    source.link(h264parser)
    h264parser.link(decoder)

    sinkpad = streammux.get_request_pad("sink_0")
    if not sinkpad:
        sys.stderr.write("Unable to get the sink pad of streammux")

    srcpad = decoder.get_static_pad("src")
    if not srcpad:
        sys.stderr.write("Unable to get source pad of decoder")

    srcpad.link(sinkpad)
    streammux.link(seg)
    seg.link(nvvidconv)
    nvvidconv.link(nvosd)
    nvosd.link(sink)

    # create an event loop and feed gstreamer bus mesages to it
    loop = GLib.MainLoop()
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, loop)

    osdsinkpad = nvosd.get_static_pad("sink")
    if not osdsinkpad:
        sys.stderr.write(" Unable to get sink pad of nvosd \n")

    osdsinkpad.add_probe(Gst.PadProbeType.BUFFER, osd_sink_pad_buffer_probe, 0)

    print("Starting pipeline \n")

    # start play back and listen to events
    pipeline.set_state(Gst.State.PLAYING)
    try:
        loop.run()
    except:
        pass

    # cleanup
    pipeline.set_state(Gst.State.NULL)

I suspect there might be an issue with the segmentation mask. The output mask from my model has a shape of 224x224x1. I believe DeepStream is only considering the last two dimensions, interpreting it as 224x1, and thus mistakenly extracting the last column. Could this be the cause? If so, would changing the model output to 224x224 instead of 224x224x1 resolve the issue?

I found the solution. The model’s output format needs to match the input format, which should be NCHW. When the output is in NHWC format, DeepStream only considers the last two dimensions, resulting in a segmentation mask with dimensions of just width and height, and it ends up returning only the last column of the mask.

When converting to ONNX, use both the --outputs-as-nchw and --inputs-as-nchw options. DeepStream requires that both input and output formats be in NCHW.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.