In instance segmentation, what should be the data in the mask?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version:deepstream 6.0
• TensorRT Version:tensorrt8
• Issue Type( questions, new requirements, bugs) :questions

When I try to deploy the yolov5s-seg instance segmentation model in deepstream, I have a problem that the mask of the target is not displayed, or displayed incompletely.

The image size is 1920×1080, and the network input size is 640×640. Through debugging, we know that the mask will not rescale to the original image size. Therefore, in the post-processing, I restore the mask to the original image, and the data in the mask is a value of 0-1.

In addition, through the following tests, we can confirm that there is no problem with the incoming data.

    obj.mask_size = kImageH * kImageW * sizeof(float);
    obj.mask = new float[kImageH * kImageW];
    obj.mask_width = kImageW;
    obj.mask_height = kImageH;

    float* rawMask = reinterpret_cast<float *>(masks.at(idx).data);
    memcpy (obj.mask, rawMask, obj.mask_size);
    // test for memcpy
    cv::Mat tmp(kInputH, kInputW, CV_32FC1, (void*)obj.mask);
    cv::Mat uchar_mat;
    tmp.convertTo(uchar_mat, CV_8UC1, 255);
    cv::imwrite(std::to_string(idx)+".jpg", uchar_mat);

below is the output:







But the final output mask is wrong, the result is as follows:

So, I want to know what the data in the mask should be。

The mask pictures you post show correct masks.

E.G.

This is the mask for the person with backpack on his shoulder.

This is the mask for the bus:

the gray mask pictures is created during post-processing and is only used to judge whether the mask is correct. But the mask in the video file generated after inference is not good.

What do you mean by “is not good”? Can you elaborate your requirement or the issue you found?

this is the output video file.You can see that the result of the mask is not good at all。

i upload the video,You can see that the result of the mask is not good at all。But the gray mask I output in post-processing is fine.

You need to scaling the output back according to the model preprocessing. How did you do the preprocess scaling with your yolov5s-seg model? Is there “keep-aspect-ratio” operation? What is the size of the mask output matrix(640x640 or other size)?

Is this the original video size, nvstreammux size or finally display size?

the image size is 1920 × 1080, the network input is 640×640, with “keep-aspect-ratio” operation。the output of mask size should be 640 × 640, but i have rescaled the mask to 1920 x 1080.

Is this the original video size, nvstreammux size or finally display size?

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

What is the final display size? Where did you scale the mask?

Suppose your final display resolution is 1920x1080, since your model accept the padded scaled image as input, the output is also padded. So you need to scale the valid part(removing the padding parts) of the output to the display resolution.

the final display size is 1920x1080. yes, I rescale the mask from 640 × 640 to 1920× 1080 considering the padding. code show as below:

// the input mask_mat shape is 640 x 640
 cv::Mat img_mask = scale_mask(mask_mat, kImageH, kImageW);
// rescale code 
cv::Mat scale_mask(cv::Mat mask, uint32_t img_h, uint32_t img_w) {
  int x, y, w, h;
  float r_w = kInputW / (img_w * 1.0);
  float r_h = kInputH / (img_h * 1.0);
  if (r_h > r_w) {
    w = kInputW;
    h = r_w * img_h;
    x = 0;
    y = (kInputH - h) / 2;
  } else {
    w = r_h * img_w;
    h = kInputH;
    x = (kInputW - w) / 2;
    y = 0;
  }
  cv::Rect r(x, y, w, h);
  cv::Mat res;
  cv::resize(mask(r), res, cv::Size2d(img_w, img_h));
  return res;
}

Post-processing operations are done in NvDsInferParseYolov5Seg function.

// nvinfer config file
cluster-mode=4
# lib path for instance segmentation
parse-bbox-instance-mask-func-name=NvDsInferParseYolov5Seg
custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/instanceSeg_yolov5/nvdsinfer_yolov5_seg_impl/libnvinfer_yolov5_seg.so
	
output-instance-mask=1
	
segmentation-threshold=0.3

How can you guarantee the mask is for the frame you paste?

we just see the bus object,the gray picture is saved in NvDsInferParseYolov5Seg function,the RGB picture is output with the pipeline in the video. They are all the first frame data of the video as the result of the input。I think the mask data is changed in somewhere, but I can’t locate it right now.


So the mask is correct. The issue has nothing to do with DeepStream. Please debug your code.

??? I mean the RGB image’s mask should same as the gray image, but they are not the same. the RGB image is output by nvosd.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Are you displaying such video with the sample NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream (github.com)?