Get mask data from `nvdsinfer_custombboxparser_mrcnn_uff` mrcnn parser plugin

• Hardware Platform (Jetson / GPU) Jetson Xavier AGX
• DeepStream Version 5.1
• JetPack Version (valid for Jetson only) 4.5.1
• TensorRT Version 7.1.3
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) Question
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I am running an optimised version of Mask-RCNN by obtaining Uff file using TensorRT which I used to obtain an engine file. I run inference on this using the Deepstream4.x samples which provides a custom parser for mrcnn: nvdsinfer_customparser_mrcnn_uff. The compiled app provides only bounding boxes as output (overlayed on the frame) but I require the mask too.

Some users suggest dumping the mask_data within the parser but this doesn’t solve the purpose for those avoiding writing files during operation. It also requires scaling the mask. I noticed the custom parser to be directly connected to nvtiler:

nvtiler (defined in gst-nvmultistreamtiler) performs all the re-scaling and post-inference operations on the inference results (bounding box). The mask data is not forwarded to this element as I can see in the code snippet from the

    for (unsigned int roi_id = 0; roi_id < binfo.size(); roi_id++) {
        NvDsInferObjectDetectionInfo object; // *****NOTICE THIS********
        object.classId = binfo[roi_id].label;
        object.detectionConfidence = binfo[roi_id].prob;

        /* Clip object box co-ordinates to network resolution */
        object.left = CLIP(binfo[roi_id].box.x1 * networkInfo.width, 0, networkInfo.width - 1);
        object.top = CLIP(binfo[roi_id].box.y1 * networkInfo.height, 0, networkInfo.height - 1);
        object.width = CLIP((binfo[roi_id].box.x2 - binfo[roi_id].box.x1) * networkInfo.width, 0, networkInfo.width - 1);
        object.height = CLIP((binfo[roi_id].box.y2 - binfo[roi_id].box.y1) * networkInfo.height, 0, networkInfo.height - 1);

        objectList.push_back(object);
    }

Although the docs for gst-nvinfer do mention that its capable of doing so:

But I can’t figure out how; I failed to find the definition of NvDsInferObjectDetectionInfo (the data structure used to accomodate the inference resutls) and only got the link to nvdsinfer_custom_impl.h

The deepstream 4.x is not supported any longer. Please upgrade to latest DeepStream 6.0 GA.

The mask data is stored in mask_params in NvDsObjectMeta. Its is a _NvOSD_MaskParams struct

Please study the DeepStream MetaData by MetaData in the DeepStream SDK — DeepStream 6.0 Release documentation

1 Like

Hi Fiona, I am using 5.1 and the 4.x repository is just used for the custom parser.

Thanks for the references. I tried to log the contents of the

// Extract mask
            NvOSD_MaskParams maskParams = obj_meta->mask_params;
            float * maskParamsData = maskParams.data;
            unsigned int maskParamsSize = maskParams.size;
            unsigned int maskParamsWidth = maskParams.width;
            unsigned int maskParamsHeight = maskParams.height;
            // log
            g_print ("maskParamsSize = %d maskParamsWidth = %d "
              "maskParamsHeight = %d\n",
              maskParamsSize, maskParamsWidth, maskParamsHeight);

where obj_meta is defined as

obj_meta = (NvDsObjectMeta *) (l_obj->data);

as per the /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-image-decode-test (the sample that I use currenlty).

But all values return 0. Also, By looking at the parser’s code, I don’t think the mask data is being exported outside the parser. Also, the gstreamer pipeline, when visualised, doesn’t make sense to be sending anything from NvDsInferObjectDetectionInfo directly to GstNvDsOsd.

Have you set “output-instance-mask=1” with nvinfer?

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html#id2

We have proper sample for TAO models. NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream (github.com).

I have set the params as per the guide shared by you. It also replaces the NvDsInferParseCustomMrcnnUff with NvDsInferParseCustomMrcnnTLT. As my model is not trained using TAO, I get errors despite successful compilation.

Opening in BLOCKING MODE
Opening in BLOCKING MODE 
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:07.963338528  9343   0x55b1fadb90 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/home/virus/Desktop/optimisation/deepstream_4.x_apps/res101-holygrail-ep26-fp16.engine
INFO: [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_image     3x1024x1024     
1   OUTPUT kFLOAT mrcnn_detection 100x6           
2   OUTPUT kFLOAT mrcnn_mask/Sigmoid 100x4x28x28     

0:00:07.963583302  9343   0x55b1fadb90 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /home/virus/Desktop/optimisation/deepstream_4.x_apps/res101-holygrail-ep26-fp16.engine
0:00:08.095824565  9343   0x55b1fadb90 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:pgie_mrcnn_uff_config.txt sucessfully
Running...
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
0:00:09.985972782  9343   0x55b19a3b70 ERROR                nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::parseBoundingBox() <nvdsinfer_context_impl_output_parsing.cpp:59> [UID = 1]: Could not find output coverage layer for parsing objects
0:00:09.986096721  9343   0x55b19a3b70 ERROR                nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::fillDetectionOutput() <nvdsinfer_context_impl_output_parsing.cpp:733> [UID = 1]: Failed to parse bboxes
Segmentation fault (core dumped)

This hints towards the fact that either I re-train using TAO or write/modify the NvDsInferParseCustomMrcnnUff (the parser that works fine but only gives me bounding box) such that it also transports the mask metadata to the parent program.

Am I correct in my evaluation ?

So please check with your network. The mrcnn post-processing is open source.
deepstream_tao_apps/post_processor at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

Hi Fiona. I augmented the NvDsInferParseCustomMrcnnUff code such that the mask metadata is also returned along with the bbox. For this, I use NvDsInferParseCustomMrcnnTLTV2 as reference. So both use different structures to return data

 NvDsInferParseCustomMrcnnUff : NvDsInferObjectDetectionInfo
 NvDsInferParseCustomMrcnnTLTV2 : NvDsInferInstanceMaskInfo

What I needed was a combination of both so I made some modifications and was able to

  • infer bbox+mask data for each inferred object
  • append them to the object list objectList
  • Return this list in a container vector<NvDsInferInstanceMaskInfo>

On changing the config file to display masks (as per your suggestion), I get errors as

oadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:pgie_mrcnn_uff_config.txt sucessfully
Running...
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
0:00:05.026531740  9152   0x558966e370 ERROR                nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::parseBoundingBox() <nvdsinfer_context_impl_output_parsing.cpp:59> [UID = 1]: Could not find output coverage layer for parsing objects
0:00:05.026740390  9152   0x558966e370 ERROR                nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::fillDetectionOutput() <nvdsinfer_context_impl_output_parsing.cpp:733> [UID = 1]: Failed to parse bboxes
Segmentation fault (core dumped)

I am aware that mixing parsers originally meant for TAO-based app with that meant for the old mrcnn samples is to be blamed here. So still, I wish to continue with this approach and derive/obtain the masks myself.

So this is where I’ll need your help:

The mask+bbox data is returned to the next elements of the gsteramer/deepstream pipeline and the OSD element uses this data to overlay bbox on the displayed frame. Similar to this, I wish to play with the mask data but outside of the parser function (preferrably in the main app:deepstream_custom.c). Can you guide me in the right direction ?

*P.S. I know I can look into the source of the OSD element and draw the mask there but this won’t be much helpful as I want to do other things with this mask data right in the main app: deepstream_custom.c. I am pretty sure there exists a method to access the metadata of the nvinfer plugin right, but just can’t find the entry to the rabbit hole *