Deepstream doesn't give expected Mask-RCNN output

Hello.

I flashed my Jetson Xavier AGX with JetPack 4.5.1 that had Deepstream 5.1 preinstalled.

I obtained the UFF model from my h5 and also generated the corresponding engine file. I can perform inference on the UFF file using the mrcnn sample provided in the TensorRT C/C++ samples.

Now I wish to integrate the model into DeepStream.

For ease with Python, I cloned the https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/blob/master/apps/deepstream-segmentation/ sample made a custom config file:

################################################################################
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8), model-file-format
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes
#
# Optional properties for detectors:
#   cluster-mode(Default=Group Rectangles), interval(Primary mode only, Default=0)
#   custom-lib-path,
#   parse-bbox-func-name
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.

[property]
gpu-id=0
net-scale-factor=0.003921568627451
model-color-format=0
uff-file=/home/virus/Desktop/optimisation/res101-holygrail-ep26.uff
model-engine-file=/home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine
infer-dims=3;1024;1024
uff-input-order=0
uff-input-blob-name=input_image
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
network-type=2
output-blob-names=mrcnn_mask/Sigmoid
segmentation-threshold=0.5
#parse-bbox-func-name=NvDsInferParseCustomSSD
#custom-lib-path=nvdsinfer_custom_impl_ssd/libnvdsinfer_custom_impl_ssd.so
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
pre-cluster-threshold=0.0
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

# Optional added by pope
labelfile-path=custom_labels.txt
## Per class configuration
#[class-attrs-2]
#threshold=0.6
#roi-top-offset=20
#roi-bottom-offset=10
#detected-min-w=40
#detected-min-h=40
#detected-max-w=400
#detected-max-h=800

On running

python3 deepstream_segmentation.py custom_config.txt /home/virus/Desktop/optimisation/short.jpg /home/virus/Desktop/results1

I get improper masks, as shown in the attached file:

(a simple grey line…)

On printing the mask data, I found that the inference results don’t look like a mask but rather like a detection.

Mask RCNN has the following inputs and outputs
INFO: [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_image 3x1024x1024 ----------------[Input Layer]
1 OUTPUT kFLOAT mrcnn_detection 100x6-----------------[Bounding Box Detection]
2 OUTPUT kFLOAT mrcnn_mask/Sigmoid 100x4x28x28 ----[Mask]

This is what the mask looks like

[popo_notes][mask]
 [[-1 -1 -1 ..., 48 -1 -1]
 [-1 -1 49 ...,  7  9 43]
 [43 43  6 ..., -1 -1 -1]
 [-1 -1 -1 ..., 48 44 44]
 [-1 -1 -1 ...,  5  5  5]
 [ 5  9  7 ..., 44 -1 -1]]

[popo_notes][mask shape] (6, 1024)

I am not able to fiddle much with the outputs of the pipeline plugins as they are not explained in the repo very well. I suspect that the segmentation sample only expects one output: mask whereas my model provides 2: mask+detection. But I am not very sure about this.

Hi,

The example you used is designed for UNet.
You can check the below repository for the MaskRCNN-like example:

Thanks.

1 Like

There is no example specifc to Mask-RCNN. How is it different from the Unet sample ?

Hi,

The MaskRCNN-like model is called PeopleSegNet.
For different models, you need a specific parser to translate the output and draws it accordingly.

Thanks.

Hi

I tried running the PeopleSegNet example using my custom engine file. This is what the modified config file looks like


[property]
gpu-id=0
net-scale-factor=0.007843

# Since the model input channel is 3, using RGB color format.

model-color-format=0
offsets=127.5;127.5;127.5
labelfile-path=labels.txt

##Replace following path to your model file

model-engine-file=/home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine 


#DS5.x cannot parse onnx etlt model, so you need to
#convert the etlt model to TensoRT engine first use tao-convert

tlt-encoded-model=../../models/peopleSemSegNet/peoplesemsegnet.etlt
tlt-model-key=tlt_encode

infer-dims=3;1024;1024
batch-size=1

## 0=FP32, 1=INT8, 2=FP16 mode

network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
network-type=2
output-blob-names=mrcnn_mask/Sigmoid
segmentation-threshold=0.0

##specify the output tensor order, 0(default value) for CHW and 1 for HWC

segmentation-output-order=0

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

But it must be noted that I obtained my engine file by running trtexec on my UFF file and NOT my TLT model. Is it still correct ? Because I get the following error:

virus@virus-desktop:~/Desktop/optimisation/deepstream_tao_apps/apps/tao_segmentation$ ./ds-tao-segmentation -c custom_config.txt -i /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
Error: Could not parse model engine file path
Failed to parse group property
** ERROR: <gst_nvinfer_parse_config_file:1260>: failed
Now playing: custom_config.txt
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
0:00:00.296638130 19485   0x557fb78150 WARN                 nvinfer gstnvinfer.cpp:769:gst_nvinfer_start:<primary-nvinference-engine> error: Configuration file parsing failed
0:00:00.296689556 19485   0x557fb78150 WARN                 nvinfer gstnvinfer.cpp:769:gst_nvinfer_start:<primary-nvinference-engine> error: Config file path: custom_config.txt
Running...
ERROR from element primary-nvinference-engine: Configuration file parsing failed
Error details: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(769): gst_nvinfer_start (): /GstPipeline:ds-custom-pipeline/GstNvInfer:primary-nvinference-engine:
Config file path: custom_config.txt
Returned, stopping playback
Deleting pipeline

Ok I think I need to optimise my UFF model to TLT using TAO. On referring to Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA TAO Toolkit | NVIDIA Developer Blog Is retraining the only way ? Can’t I use my UFF model to infer on Jetson ?

@AastaLLL I got past this error which was actually due to rebuilding TensorRT and the pre-installed TensorRT that came with JetPack 4.5.1 worked out fine. I re-did all the above steps and got the following successful inference.

ds-tao-segmentation -c ~/Desktop/optimisation/deepstream_tao_apps/customConfigs/custom_config.txt -i ~/Desktop/optimisation/large.jpg 
Now playing: /home/virus/Desktop/optimisation/deepstream_tao_apps/customConfigs/custom_config.txt
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
0:00:03.762353419 10172   0x55905e4040 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine
INFO: [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_image     3x1024x1024     
1   OUTPUT kFLOAT mrcnn_detection 100x6           
2   OUTPUT kFLOAT mrcnn_mask/Sigmoid 100x4x28x28     

0:00:03.762579860 10172   0x55905e4040 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine
0:00:03.899089690 10172   0x55905e4040 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:/home/virus/Desktop/optimisation/deepstream_tao_apps/customConfigs/custom_config.txt sucessfully
Running...
NvMMLiteBlockCreate : Block : BlockType = 256 
[JPEG Decode] BeginSequence Display WidthxHeight 1024x1024
in videoconvert caps = video/x-raw(memory:NVMM), format=(string)RGBA, framerate=(fraction)1/1, width=(int)1280, height=(int)720
End of stream
Returned, stopping playback
[JPEG Decode] NvMMLiteJPEGDecBlockPrivateClose done
[JPEG Decode] NvMMLiteJPEGDecBlockClose done
Deleting pipeline

But the mask saved is all black. My config file is as follows:


[property]
gpu-id=0
net-scale-factor=0.007843

# Since the model input channel is 3, using RGB color format.

model-color-format=0
offsets=127.5;127.5;127.5
labelfile-path=./custom_labels.txt

##Replace following path to your model file

model-engine-file=/home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine

#DS5.x cannot parse onnx etlt model, so you need to
#convert the etlt model to TensoRT engine first use tao-convert

tlt-encoded-model=../../models/peopleSemSegNet/peoplesemsegnet.etlt
tlt-model-key=tlt_encode

infer-dims=3;1024;1024 
##3;544;960
batch-size=1

## 0=FP32, 1=INT8, 2=FP16 mode

network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
network-type=2
output-blob-names=mrcnn_mask/Sigmoid
segmentation-threshold=0.0

##specify the output tensor order, 0(default value) for CHW and 1 for HWC

segmentation-output-order=1

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

Note that I have no tlt model and simply leave the

tlt-encoded-model=../../models/peopleSemSegNet/peoplesemsegnet.etlt
tlt-model-key=tlt_encode

part as initially present.

What am I missing ? How can I get the mask ?

Hi,

Would you mind sharing your custom source/model with us?
So we can reproduce this issue in our environment for checking.

Thanks.

Here is the UFF model : res101-holygrail-ep26.uff - Google Drive

Thanks