Running Pre-trained Segformer - CityScapes our of the box from NGC

Please provide complete information as applicable to your setup.

• Hardware Platform Jetson
• DeepStream Version 6.2
• JetPack Version 5.1

I wanted to run the cityscapes_fan_tiny_hybrid_224.onnx model from here

And I found some advice that is hard to parse.

In the overview section, it shows how to make the label file. (it tis straightforward and I have attached the file below)

  1. It has 19 labels (see attached)

segformer_labels.txt (141 Bytes)

secondly it has the key parameter file

# You can either provide the onnx model and key or trt engine obtained by using tao-converter
# onnx-file=../../path/to/.onnx file
model-engine-file=../../path/to/trt_engine
net-scale-factor=0.01735207357279195
offsets=123.675;116.28;103.53
# Since the model input channel is 3, using RGB color format.
model-color-format=0
labelfile-path=./labels.txt
infer-dims=3;1024;1024
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
interval=0
gie-unique-id=1
cluster-mode=2
## 0=Detector, 1=Classifier, 2=Semantic Segmentation, 3=Instance Segmentation, 100=Other
network-type=100
output-tensor-meta=1
num-detected-classes=20
segmentation-output-order=1
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

it has some more information here on the infer config file creation

also there is an example here on the infer config file however I could not find the label file for ds6.2 bu I think that was strighforward list of classes as I have attached.

after reading all the resources I have made the file below

my infer config

[property]
gpu-id=0
net-scale-factor=0.007843
model-color-format=0 
offsets=127.5;127.5;127.5
labelfile-path=segformer_labels.txt
model-engine-file=../1/model-segformer-max-input-batch-size-1-no_extra_configs.plan
infer-dims=3;224;224
batch-size=1
network-mode=2
num-detected-classes=19
segmentation-output-order=1
interval=0
gie-unique-id=1
cluster-mode=2
network-type=100
output-tensor-meta=1 

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
#detected-max-w=0
#detected-max-h=0

I then slighly modified/simplified the padprobe to work with a single batch input

def seg_src_pad_buffer_probe(pad, info, u_data):
    gst_buffer = info.get_buffer()

    if not gst_buffer:
        sys.stderr.write("unable to get pgie src pad buffer\n")
        return

    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list # because our our bach size is 1 we have only one frame 
    frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    frame_number = frame_meta.frame_num

    l_user = frame_meta.frame_user_meta_list
    while l_user is not None:
        try:
            # Note that l_user.data needs a cast to pyds.NvDsUserMeta
            # The casting is done by pyds.NvDsUserMeta.cast()
            # The casting also keeps ownership of the underlying memory
            # in the C code, so the Python garbage collector will leave
            # it alone.
            # print("user meta data found")
            seg_user_meta = pyds.NvDsUserMeta.cast(l_user.data)
            
        except StopIteration:
            print("Error casting user meta data")
            break

        # print(f"seg_user_meta.base_meta.meta_type: {seg_user_meta.base_meta.meta_type}")
        # if seg_user_meta and seg_user_meta.base_meta.meta_type == pyds.NVDSINFER_TENSOR_OUTPUT_META:
        if seg_user_meta and seg_user_meta.base_meta.meta_type == pyds.NVDSINFER_SEGMENTATION_META:
            try:
                # Note that seg_user_meta.user_meta_data needs a cast to
                # pyds.NvDsInferSegmentationMeta
                # The casting is done by pyds.NvDsInferSegmentationMeta.cast()
                # The casting also keeps ownership of the underlying memory
                # in the C code, so the Python garbage collector will leave
                # it alone.
                segmeta = pyds.NvDsInferSegmentationMeta.cast(seg_user_meta.user_meta_data)
            except StopIteration:
                break

            print("Segmentation meta data found for frame %d" % frame_number)
            # Retrieve mask data in the numpy format from segmeta
            # Note that pyds.get_segmentation_masks() expects object of
            # type NvDsInferSegmentationMeta
            masks = pyds.get_segmentation_masks(segmeta)
            masks = np.array(masks, copy=True, order='C')
            # map the obtained masks to colors of 2 classes.
            frame_image = map_mask_as_display_bgr(masks)
            cv2.imwrite(folder_name + "/" + str(frame_number) + ".jpg", frame_image)
        try:
            l_user = l_user.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK
    

then I linked my pipeline e.g.

source > convert  > mux > nvinfer > nvsegvisual > display and file sinks (in addition to the folder that will save images in the padprobe function. 

for all the above I used the segmentation example as a template

My issue is.

  1. I cannot see any segmentation output in my output sinks
  2. on investigation I noticed that the output meta type Im getting is NVDSINFER_TENSOR_OUTPUT_META not NVDSINFER_SEGMENTATION_META I think this may be some config issue

The pipeline is playing and I can see the normal video playbaack and it doe s not show any errors overtly.

Can you please help me set the config correctly or show me where I get this wrong.

Thanks,
Ganindu.

We have a similar demo seg_app_unet.yml. You can see if that neet your needs.

Does that work with segformer model that I used? I want to detect and segment pixels belonging to the road.

It uses the following model citysemsegformer. You can refer to the configuration file of the demo I attached. They are similar.

Thanks,

I looked at your example and got the model that was the basis of the qiestion to work (sort of ) But I don’t really understand why it is working (unless that is an illusion) becasuse the orignial advice on the model card is confliciting with what seems to work (e.g. network type, cluster mode)

this is my updated infer-config

[property]
gpu-id=0
net-scale-factor=0.007843
model-color-format=0 
offsets=123.675;116.28;103.53
labelfile-path=segformer_labels.txt
model-engine-file=../1/model-segformer-max-input-batch-size-1-no_extra_configs.plan
infer-dims=3;224;224
batch-size=1
network-mode=2
num-detected-classes=19
segmentation-output-order=0
interval=0
gie-unique-id=1
cluster-mode=4
network-type=2
output-tensor-meta=1 

#workspace-size=1048576
segmentation-threshold=0.0
output-blob-names=output
output-io-formats=output:int32:chw

#maintain-aspect-ratio=1

network-input-order=0

output-instance-mask=1


filter-out-class-ids=0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19



segmentation-output-order=0


[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
#detected-max-w=0
#detected-max-h=0

this is the buffer probe

def seg_src_pad_buffer_probe(pad, info, u_data):
    gst_buffer = info.get_buffer()

    if not gst_buffer:
        sys.stderr.write("unable to get pgie src pad buffer\n")
        return

    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list # because our our bach size is 1 we have only one frame 
    frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    frame_number = frame_meta.frame_num

    l_user = frame_meta.frame_user_meta_list
    while l_user is not None:
        try:
            # Note that l_user.data needs a cast to pyds.NvDsUserMeta
            # The casting is done by pyds.NvDsUserMeta.cast()
            # The casting also keeps ownership of the underlying memory
            # in the C code, so the Python garbage collector will leave
            # it alone.
            # print("user meta data found")
            seg_user_meta = pyds.NvDsUserMeta.cast(l_user.data)
            
        except StopIteration:
            print("Error casting user meta data")
            break

        # print(f"seg_user_meta.base_meta.meta_type: {seg_user_meta.base_meta.meta_type}")
        # if seg_user_meta and seg_user_meta.base_meta.meta_type == pyds.NVDSINFER_TENSOR_OUTPUT_META:
        if seg_user_meta and seg_user_meta.base_meta.meta_type == pyds.NVDSINFER_SEGMENTATION_META:
            try:
                # Note that seg_user_meta.user_meta_data needs a cast to
                # pyds.NvDsInferSegmentationMeta
                # The casting is done by pyds.NvDsInferSegmentationMeta.cast()
                # The casting also keeps ownership of the underlying memory
                # in the C code, so the Python garbage collector will leave
                # it alone.
                segmeta = pyds.NvDsInferSegmentationMeta.cast(seg_user_meta.user_meta_data)
            except StopIteration:
                break

            # print("Segmentation meta data found for frame %d" % frame_number)
            # Retrieve mask data in the numpy format from segmeta
            # Note that pyds.get_segmentation_masks() expects object of
            # type NvDsInferSegmentationMeta
            masks = pyds.get_segmentation_masks(segmeta)
            masks = np.array(masks, copy=True, order='C')

            #get imageto overlay mask 
            orig_image = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)
            # convert the color for to BGR
            orig_image = cv2.cvtColor(orig_image, cv2.COLOR_RGBA2BGR)

            
            # map the obtained masks to colors of 2 classes.
            segmentation_map = map_mask_as_display_bgr(masks)

            # After resizing the segmentation map
            segmentation_map = cv2.resize(segmentation_map, (orig_image.shape[1], orig_image.shape[0]))

            # Ensure both images are of type np.uint8
            if orig_image.dtype != np.uint8:
                orig_image = orig_image.astype(np.uint8)
            if segmentation_map.dtype != np.uint8:
                segmentation_map = segmentation_map.astype(np.uint8)

            # Now merge the mask and the original image
            frame_image = cv2.addWeighted(orig_image, 0.5, segmentation_map, 0.5, 0)

            # cv2.imwrite(folder_name + "/" + str(frame_number) + ".jpg", frame_image)
            cv2.imwrite(f"{folder_name}/{frame_number:06}.jpg", frame_image)
        try:
            l_user = l_user.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK

I can geyt the segmask to overlay on the images but filtering out class id’s still don;t work

my label file

road
sidewalk
building
wall
fence
pole
traffic light
traffic sign
vegetation
terrain
sky
person
rider
car
truck
bus
train
motorcycle
bicycle

even if i add the line

filter-out-class-ids=0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19

I would see inferences (see image below, this is a saved image from the buffer probe)

I’m not bothered about the quality of the inference here (becasuse we will plug in a better model) butI want to understand how this setup works (e;g; I want to filter out everything but the road, I don;t want to see the siudewalk)

secondly I would like to have this kind of overlayed output to the display (as this is a part of a demo) when I try segvisual I can only see the segmentaion output which isn;t really helpful. without a nice overlay (ideally where I can control alpha to adjust the blend in amount)

pipeline topology
output_seg.zip (97.9 KB)

can you make some suggestions to holw to make class filtering work and overlay segmentation please.

Cheers

P.S Furthermore to the example you pointed out, do you think pythonifying the code below will help me work the metadata all the way down to the sink elements (e.g. drmsink) to be overlayed on top of the source video?

//---Fill NvDsInferSegmentationMeta structure---
          // Acquire a new NvDsUserMeta object from frame_meta.
          NvDsUserMeta *user_meta = nvds_acquire_user_meta_from_pool (frame_meta->base_meta.batch_meta);
          NvDsInferSegmentationMeta *segmeta = (NvDsInferSegmentationMeta *) g_malloc (sizeof (NvDsInferSegmentationMeta));

          segmeta->classes = numDetectedClasses;
          // Segmentation model ALWAYS has the same W/H as the input tensor.
          segmeta->height = meta->network_info.height;
          segmeta->width = meta->network_info.width;

          // Output tensor is the class map already. There is nothing else to parse.
          // Referencing instead of copying info->buffer causes SegV crash.
          segmeta->class_map = (gint *) g_memdup(info->buffer, segmeta->width * segmeta->height * sizeof (gint));
          segmeta->class_probabilities_map = NULL;
          segmeta->priv_data = NULL;

          // Assign NvDsInferSegmentationMeta to the fields of NvDsUserMeta
          user_meta->user_meta_data = segmeta;
          user_meta->base_meta.meta_type = (NvDsMetaType) NVDSINFER_SEGMENTATION_META;
          user_meta->base_meta.release_func = release_segmentation_meta;
          user_meta->base_meta.copy_func = copy_segmentation_meta;

          nvds_add_user_meta_to_frame (frame_meta, user_meta);
          //---Fill NvDsInferSegmentationMeta structure---

Also nvsegvisual doesn;t seem to have properties original-background and alpha anymore check link to docs is there a way to achieve the same result some other way?

Do you mean that the following configuration will not work, network-type=100 and cluster-mode=2?

As described in the guide, filter-out-class-ids is used to filter out detected objects belonging to specified class-ids.

Yes.

There may be some supporting parameters that we didn’t show on the Guide. You can refer to the parameters by gst-inspect-1.0 nvsegvisual.

Thanks for the quick answer , filtering out not working with segmentation explains the odd behaviour.

Network mode 100 does not produce segment masks but it think pythonifying the c code above can help with that issue. (And does cluster mode 2 work with segmentation?)

Gst-inspect nvsegvisusl does not show the extra options for me. From which DS version do those extra controls are supported?

Thanks

We want to make this work in deepstream 6.2.
Can we patch that element (segvisual) so it has the overlay functionality.

For example pull the deb file directly and install it?

If not I guess we can create a tee and do ‘nvsegvisual’ on one branch and use a mixer to blend the two and then convert back with a converter and present to a sink

Yes. Since we cannot support that for DeepStream 6.2, you can implement yourself, including the filter function.

Thanks, out of curiosity which DS does the native overlaying work for segvis?

What do you mean by native overlaying work?

Apologies This might be me not understanding segvisual

I meant, A DeepStream Version in which anvsegvisual element exists with such capabilites as the ability to overaly the segmask rather than only forward the segmask.

for example here

I think original_background refers to the original buffersurface aka the OG input image from the stream (be it a camera or a file etc) and alpha refers to the alpha value of the mask.

I’m only assuming here becase there is no documentaion or gst-inspect to clear the muddiness surrounding this

    /* Create OSD to draw on the converted RGBA buffer */
    segvisual = gst_element_factory_make ("nvsegvisual", "nv-segvisual");
    g_object_set (G_OBJECT (segvisual), "original-background", original_background, NULL);
    g_object_set (G_OBJECT (segvisual), "alpha", alpha, NULL);

Cheers

Yes. You are right. You can use the gst-inspect-1.0 nvsegvisual on the DeepStream 6.4 version.

  alpha               : Alpha Value for per pixel blending.
                        flags: readable, writable
                        Float. Range:               0 -               1 Default: 
 original-background : Instead of masked background show original background.
                        flags: readable, writable
                        Boolean. Default: false

Thanks! I’m kinda forced to not used that now. I think we’ll have to update drivers on our custom device tree to get to 6.4 if there is a way to backport it that would be great.

The workaround I suggested works but when we compose and create frames some standard deep stream convenience elements don’t play along becase only some of them do have meta.

So for the purpose of the demo I had to add a lot is scaffolding. And it has to work with our normal stuff which is with nice meta for every frame and is working great. (I haven’t done that but yet but it’s not a very pleasant thought having to make stuff that you won’t end up using.

Anyway if you think this can’t be backported to at least 6.2 I’d say my question is answered and I’m happy to mark as resolved?

Cheers,
Ganindu.

Yes. Unfortunately, we are unable to provide a compatible library from 6.4 to 6.2 version.

1 Like

Thanks! I think my issue is resolved with the information provided here and there is sufficent information in this thread to aid someone to get to a working solution if they have a similar problem. I’m marking this as done!

Cheers,
Ganindu.