Bounding boxes visible but segmentation masks not displayed

I am working on running a segmentation model (YOLOv8-seg ONNX) with DeepStream 7.1 on Jetson Orin Nano.
I am able to see bounding boxes from the parser output, but segmentation masks are not rendered on the display, even though nvdsosd is configured with display-mask=1.

Platform Details:

  • Hardware Platform: Jetson Orin Nano

  • JetPack Version: 6.2

  • DeepStream Version: 7.1

  • TensorRT Version: 10.3.0.30

  • NVIDIA GPU Driver Version: 540.4.0


Pipeline

nvstreammux name=mux width=800 height=600 batch-size=1 ! \
nvinferserver unique-id=1 config-file-path=./nodes/yolo11n-seg.onnx/config.txt ! \
nvdsosd display-text=1 display-bbox=1 display-mask=1 ! \

nvinferserver Config (excerpt)

postprocess {
  labelfile_path: "./postprocessing/labels.txt"
  detection {
    num_detected_classes: 80
    custom_parse_bbox_func: "NvDsInferParseCustom"
    nms { confidence_threshold: 0.3 iou_threshold: 0.4 }
  }
}


Custom C++ Parser Snippet

NvDsInferInstanceMaskInfo maskInfo;
maskInfo.width  = mask_resized_w;
maskInfo.height = mask_resized_h;
maskInfo.mask   = new float[mask_resized_w * mask_resized_h];

// Copy resized mask into maskInfo
for (int y = 0; y < mask_resized_h; y++) {
    for (int x = 0; x < mask_resized_w; x++) {
        float val = maskResized.at<float>(y, x);
        // tried both soft float [0,1] and thresholded binary {0,1}
        maskInfo.mask[y * mask_resized_w + x] = (val > 0.5f ? 1.0f : 0.0f);
    }
}
object.mask_info = maskInfo;


Issue

  • Bounding boxes appear as expected.

  • Segmentation masks are not displayed (even with nvdsosd display-mask=1).

  • Tried both soft masks (float [0,1]) and binary masks (0/1).

  • No error logs from nvinferserver or nvdsosd.


Could you please clarify:

  1. Does nvdsosd / nvinferserver in DeepStream 7.1 require additional config to render segmentation masks along with bboxes?

  2. Should the masks provided in NvDsInferInstanceMaskInfo.mask be strictly binary, or are probability maps (floating-point values), since the field is defined as float?

  3. Is there a reference segmentation parser for nvinferserver along with config.txt that we can align with?

Please refer to our Mask2Former model in the deepstream-tao-app sample.

NvDsInferParseCustomMask2Former is the postprocess method for this model to parse the mask data.

I have implemented Mask2Former exactly as in the reference parser:

I also downloaded and used the official model from your repo:

My parser is unchanged from your sample.


Pipeline:

nvinferserver unique-id=1 config-file-path=config.txt ! \
nvtracker ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so \
         ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml \
         display-tracking-id=1 tracking-surface-type=0 tracking-id-reset-mode=0 ! \
nvdsosd display-text=1 display-bbox=1 display-mask=1

Model config (config.pbtxt):

name: "mask2former.plan"
platform: "tensorrt_plan"
max_batch_size: 1

input [
  {
    name: "inputs"
    dims: [3, 800, 800]
    data_type: TYPE_FP32
  }
]

output [
  {
    name: "pred_masks"
    dims: [100, 800, 800]
    data_type: TYPE_FP32
  },
  {
    name: "pred_scores"
    dims: [100]
    data_type: TYPE_FP32
  },
  {
    name: "pred_classes"
    dims: [100]
    data_type: TYPE_INT64
  }
]


Issue:
I can see bounding boxes as expected, but the segmentation masks are still not displayed, even though:

  • nvdsosd has display-mask=1.

  • The parser fills NvDsInferInstanceMaskInfo exactly like the reference.

  • No errors are reported by nvinferserver or nvdsosd.


Question:
Am I missing something in the pipeline or configuration to ensure that the masks are rendered?

Could you attach your command to run our Mask2Former? Theoretically, you don’t need to modify any code. Just run the sample follow the instructions in our Readme.

export SHOW_MASK=1; 
./apps/tao_detection/ds-tao-detection configs/app/ins_seg_app.yml

I’m able to successfully get both bounding boxes and segmentation masks when using an nvinfer pipeline. For example:

export SHOW_MASK=1
gst-launch-1.0 filesrc location=sample_720p.mp4 ! \
  qtdemux ! h264parse ! decodebin ! nvvidconv ! "video/x-raw(memory:NVMM),format=NV12" ! mux.sink_0 \
  nvstreammux name=mux width=800 height=600 batch-size=1 ! \
  nvinfer unique-id=1 config-file-path=mask2former.plan/config.txt ! \
  nvdsosd display-text=1 display-bbox=1 display-mask=1 ! \
  nvstreamdemux name=demux demux.src_0 ! queue ! nvvidconv ! \
  fpsdisplaysink video-sink="nveglglessink window-height=600 window-width=800"

However, when I switch to an nvinferserver pipeline, I only see bounding boxes (no masks):

export SHOW_MASK=1
gst-launch-1.0 filesrc location=sample_720p.mp4 ! \
  qtdemux ! h264parse ! decodebin ! nvvidconv ! "video/x-raw(memory:NVMM),format=NV12" ! mux.sink_0 \
  nvstreammux name=mux width=800 height=600 batch-size=1 ! \
  nvinferserver unique-id=1 config-file-path=mask2former.plan/config.txt ! \
  nvdsosd display-text=1 display-bbox=1 display-mask=1 ! \
  nvstreamdemux name=demux demux.src_0 ! queue ! nvvidconv ! \
  fpsdisplaysink video-sink="nveglglessink window-height=600 window-width=800"

The key difference is in the configuration files:

I also tried running with the provided app:

export SHOW_MASK=1
./apps/tao_detection/ds-tao-detection configs/app/ins_seg_app.yml

With nvinfer inside that app config, I get proper masks. But when I change it to nvinferserver, only bounding boxes are shown.

Question:
Am I missing something in the nvinferserver config to enable mask visualization? It looks like the custom parser is working with nvinfer but not with nvinferserver.

Sorry, our current nvinferserver does not support the instance segment mask function. We suggest that you follow the steps below to implement this feature yourself.

  1. By running the Mask2Former with nvinfer, you can get familiar with the code processing related to nvinfer.

  2. Implement your yolo-seg model with nvinfer first.

  3. Since our nvinfer and nvinferserver are all open source, you can implement it in the nvinferserver based on the process in nvinfer.

deepstream\sources\gst-plugins\gst-nvinfer
deepstream\sources\gst-plugins\gst-nvinferserver

Yes, my understanding is that the custom parser is the same in both cases — whether I use nvinfer or nvinferserver, and bboxes and masks are then applied by nvdsosd

Both nvinfer and nvinferserver are mainly doing inference:

  • They call the backend (TensorRT for nvinfer, Triton for nvinferserver)

  • They get back raw tensors

  • Then the custom parser interprets those tensors into DeepStream objects.

In my case, in both plugins, my parser is appending results into

std::vector<NvDsInferInstanceMaskInfo> &objectList

with the same values (masks and bounding boxes).

So if the vector has the same mask data in both cases, what is the missing piece?

  • In nvinfer, the masks appear on screen.

  • In nvinferserver, the masks don’t show up.

Is the difference only in the way the mask metadata gets attached to NvDsObjectMeta or propagated downstream to nvdsosd?
Or does nvinferserver currently drop/ignore the mask metadata, even though the parser fills objectList correctly?


Basically: If the parser populates the same structure, why do masks get drawn in nvinfer but not in nvinferserver?

It’s only in the way the mask metadata gets attached to NvDsObjectMeta. The post-processing of nvinferserver does not attach the NvDsInferInstanceMaskInfo to the object.
You can refer to our source code deepstream\sources\libs\nvdsinferserver\infer_postprocess.cpp. This part did not implement the NvDsInferInstanceMaskInfo attached. You can follow the source code form the nvinfer to implement this function.

deepstream\sources\libs\nvdsinfer\nvdsinfer_context_impl_output_parsing.cpp
NvDsInferStatus
InstanceSegmentPostprocessor::fillDetectionOutput(
    const std::vector<NvDsInferLayerInfo>& outputLayers,
    NvDsInferDetectionOutput& output)

Thank you for the response. One option is to customize nvinferserver for instance segmentation, but another possible approach is to use the nvdspostprocess plugin.

I’d like to clarify:

  • Can we implement only instance segmentation within a custom nvdspostprocess library, without having to define additional parsers such as classification or detection?
  • Does nvdspostprocess support truly generic post-processing, where the buffer itself can be modified by a custom algorithm and a new buffer injected back into the pipeline?

If not, what would be the best method to achieve this?

Yes. It can support truly generic post-processing. We don’t have many samples at present. You can only refer to the source code deepstream\sources\gst-plugins\gst-nvdspostprocess. And we will provide an example in the next version soon.

I am trying to use the nvdspostprocess plugin for classification. Unlike nvinfer or nvinferserver, I don’t see an explicit config option to provide a path to a custom parser library.

  • In nvinfer/nvinferserver, we can pass a separate .so implementing a custom parser.
  • In nvdspostprocess, the only thing I see is that we pass:
nvdspostprocess postprocesslib-config-file=config_classifier_vehicle_type.yml \
                postprocesslib-name=./postprocesslib_impl/libpostprocess_impl.so

Inside the implementation, I only see the default parser defined in:
sources/gst-plugins/gst-nvdspostprocess/postprocesslib_impl/post_processor_classify.cpp

Example:

extern "C"
bool NvDsPostProcessClassiferParseCustomSoftmax(
    std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
    NvDsInferNetworkInfo const &networkInfo,
    float classifierThreshold,
    std::vector<NvDsPostProcessAttribute> &attrList,
    std::string &descString);

But it’s not clear where such functions are expected to be loaded from.

My questions:

  1. Am I expected to directly modify post_processor_classify.cpp and define my function there?
  2. Is there a way to define this function in a separate file / .so and load it without rebuilding the whole postprocesslib_impl?
  3. For complex pipelines, I would prefer defining custom parser libraries per inference instance, rather than combining everything into one big shared library. Is this supported, and how should it be configured?

You can implement your own algorithm by yourself in a separate file / .so. And implement the following interface in your own library.

deepstream\sources\gst-plugins\gst-nvdspostprocess\gstnvdspostprocess.cpp
...
      nvdspostprocess->algo_ctx =
        nvdspostprocess->algo_factory->CreateCustomAlgoCtx(nvdspostprocess->postprocess_lib_name,
         (DSPostProcess_CreateParams*) &params);
...

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.