Deepstream doesn't give expected Mask-RCNN output

Hello.

I flashed my Jetson Xavier AGX with JetPack 4.5.1 that had Deepstream 5.1 preinstalled.

I obtained the UFF model from my h5 and also generated the corresponding engine file. I can perform inference on the UFF file using the mrcnn sample provided in the TensorRT C/C++ samples.

Now I wish to integrate the model into DeepStream.

For ease with Python, I cloned the https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/blob/master/apps/deepstream-segmentation/ sample made a custom config file:

################################################################################
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8), model-file-format
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes
#
# Optional properties for detectors:
#   cluster-mode(Default=Group Rectangles), interval(Primary mode only, Default=0)
#   custom-lib-path,
#   parse-bbox-func-name
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.

[property]
gpu-id=0
net-scale-factor=0.003921568627451
model-color-format=0
uff-file=/home/virus/Desktop/optimisation/res101-holygrail-ep26.uff
model-engine-file=/home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine
infer-dims=3;1024;1024
uff-input-order=0
uff-input-blob-name=input_image
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
network-type=2
output-blob-names=mrcnn_mask/Sigmoid
segmentation-threshold=0.5
#parse-bbox-func-name=NvDsInferParseCustomSSD
#custom-lib-path=nvdsinfer_custom_impl_ssd/libnvdsinfer_custom_impl_ssd.so
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
pre-cluster-threshold=0.0
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

# Optional added by pope
labelfile-path=custom_labels.txt
## Per class configuration
#[class-attrs-2]
#threshold=0.6
#roi-top-offset=20
#roi-bottom-offset=10
#detected-min-w=40
#detected-min-h=40
#detected-max-w=400
#detected-max-h=800

On running

python3 deepstream_segmentation.py custom_config.txt /home/virus/Desktop/optimisation/short.jpg /home/virus/Desktop/results1

I get improper masks, as shown in the attached file:

(a simple grey line…)

On printing the mask data, I found that the inference results don’t look like a mask but rather like a detection.

Mask RCNN has the following inputs and outputs
INFO: [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_image 3x1024x1024 ----------------[Input Layer]
1 OUTPUT kFLOAT mrcnn_detection 100x6-----------------[Bounding Box Detection]
2 OUTPUT kFLOAT mrcnn_mask/Sigmoid 100x4x28x28 ----[Mask]

This is what the mask looks like

[popo_notes][mask]
 [[-1 -1 -1 ..., 48 -1 -1]
 [-1 -1 49 ...,  7  9 43]
 [43 43  6 ..., -1 -1 -1]
 [-1 -1 -1 ..., 48 44 44]
 [-1 -1 -1 ...,  5  5  5]
 [ 5  9  7 ..., 44 -1 -1]]

[popo_notes][mask shape] (6, 1024)

I am not able to fiddle much with the outputs of the pipeline plugins as they are not explained in the repo very well. I suspect that the segmentation sample only expects one output: mask whereas my model provides 2: mask+detection. But I am not very sure about this.

Hi,

The example you used is designed for UNet.
You can check the below repository for the MaskRCNN-like example:

Thanks.

1 Like

There is no example specifc to Mask-RCNN. How is it different from the Unet sample ?

Hi,

The MaskRCNN-like model is called PeopleSegNet.
For different models, you need a specific parser to translate the output and draws it accordingly.

Thanks.

Hi

I tried running the PeopleSegNet example using my custom engine file. This is what the modified config file looks like


[property]
gpu-id=0
net-scale-factor=0.007843

# Since the model input channel is 3, using RGB color format.

model-color-format=0
offsets=127.5;127.5;127.5
labelfile-path=labels.txt

##Replace following path to your model file

model-engine-file=/home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine 


#DS5.x cannot parse onnx etlt model, so you need to
#convert the etlt model to TensoRT engine first use tao-convert

tlt-encoded-model=../../models/peopleSemSegNet/peoplesemsegnet.etlt
tlt-model-key=tlt_encode

infer-dims=3;1024;1024
batch-size=1

## 0=FP32, 1=INT8, 2=FP16 mode

network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
network-type=2
output-blob-names=mrcnn_mask/Sigmoid
segmentation-threshold=0.0

##specify the output tensor order, 0(default value) for CHW and 1 for HWC

segmentation-output-order=0

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

But it must be noted that I obtained my engine file by running trtexec on my UFF file and NOT my TLT model. Is it still correct ? Because I get the following error:

virus@virus-desktop:~/Desktop/optimisation/deepstream_tao_apps/apps/tao_segmentation$ ./ds-tao-segmentation -c custom_config.txt -i /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
Error: Could not parse model engine file path
Failed to parse group property
** ERROR: <gst_nvinfer_parse_config_file:1260>: failed
Now playing: custom_config.txt
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
0:00:00.296638130 19485   0x557fb78150 WARN                 nvinfer gstnvinfer.cpp:769:gst_nvinfer_start:<primary-nvinference-engine> error: Configuration file parsing failed
0:00:00.296689556 19485   0x557fb78150 WARN                 nvinfer gstnvinfer.cpp:769:gst_nvinfer_start:<primary-nvinference-engine> error: Config file path: custom_config.txt
Running...
ERROR from element primary-nvinference-engine: Configuration file parsing failed
Error details: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(769): gst_nvinfer_start (): /GstPipeline:ds-custom-pipeline/GstNvInfer:primary-nvinference-engine:
Config file path: custom_config.txt
Returned, stopping playback
Deleting pipeline

Ok I think I need to optimise my UFF model to TLT using TAO. On referring to Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA TAO Toolkit | NVIDIA Technical Blog Is retraining the only way ? Can’t I use my UFF model to infer on Jetson ?

@AastaLLL I got past this error which was actually due to rebuilding TensorRT and the pre-installed TensorRT that came with JetPack 4.5.1 worked out fine. I re-did all the above steps and got the following successful inference.

ds-tao-segmentation -c ~/Desktop/optimisation/deepstream_tao_apps/customConfigs/custom_config.txt -i ~/Desktop/optimisation/large.jpg 
Now playing: /home/virus/Desktop/optimisation/deepstream_tao_apps/customConfigs/custom_config.txt
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
0:00:03.762353419 10172   0x55905e4040 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine
INFO: [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_image     3x1024x1024     
1   OUTPUT kFLOAT mrcnn_detection 100x6           
2   OUTPUT kFLOAT mrcnn_mask/Sigmoid 100x4x28x28     

0:00:03.762579860 10172   0x55905e4040 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine
0:00:03.899089690 10172   0x55905e4040 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:/home/virus/Desktop/optimisation/deepstream_tao_apps/customConfigs/custom_config.txt sucessfully
Running...
NvMMLiteBlockCreate : Block : BlockType = 256 
[JPEG Decode] BeginSequence Display WidthxHeight 1024x1024
in videoconvert caps = video/x-raw(memory:NVMM), format=(string)RGBA, framerate=(fraction)1/1, width=(int)1280, height=(int)720
End of stream
Returned, stopping playback
[JPEG Decode] NvMMLiteJPEGDecBlockPrivateClose done
[JPEG Decode] NvMMLiteJPEGDecBlockClose done
Deleting pipeline

But the mask saved is all black. My config file is as follows:


[property]
gpu-id=0
net-scale-factor=0.007843

# Since the model input channel is 3, using RGB color format.

model-color-format=0
offsets=127.5;127.5;127.5
labelfile-path=./custom_labels.txt

##Replace following path to your model file

model-engine-file=/home/virus/Desktop/optimisation/res101-holygrail-ep26-fp16.engine

#DS5.x cannot parse onnx etlt model, so you need to
#convert the etlt model to TensoRT engine first use tao-convert

tlt-encoded-model=../../models/peopleSemSegNet/peoplesemsegnet.etlt
tlt-model-key=tlt_encode

infer-dims=3;1024;1024 
##3;544;960
batch-size=1

## 0=FP32, 1=INT8, 2=FP16 mode

network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
network-type=2
output-blob-names=mrcnn_mask/Sigmoid
segmentation-threshold=0.0

##specify the output tensor order, 0(default value) for CHW and 1 for HWC

segmentation-output-order=1

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

Note that I have no tlt model and simply leave the

tlt-encoded-model=../../models/peopleSemSegNet/peoplesemsegnet.etlt
tlt-model-key=tlt_encode

part as initially present.

What am I missing ? How can I get the mask ?

Hi,

Would you mind sharing your custom source/model with us?
So we can reproduce this issue in our environment for checking.

Thanks.

Here is the UFF model : res101-holygrail-ep26.uff - Google Drive

Thanks

Hi,

The mask are all black indicates that there are nothing detected.

Just check your configuration, could you valid if net-scale-factor and offsets are correct?
It is common that an incorrect data range leads to the detector malfunction.

...
net-scale-factor=0.007843

# Since the model input channel is 3, using RGB color format.

model-color-format=0
offsets=127.5;127.5;127.5
...

Thanks.

I can’t find the way to calculate net-scale-factor and offsets as they are not a part of Mask-RCNN and rather added by NVIDIA.

I did find a related resource: Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA TAO Toolkit | NVIDIA Technical Blog where they do have a similar config file. Using its parameter makes no difference to the earlier results.

@ChrisDing did share the Deepstrea4.0 samples to tackle this here converting mask rcnn to tensor rt - #31 by ChrisDing

But I see its outdated.

Hi,

May I know how do you train your model first?
Do you train it with TLT (TAO) or other frames like TensorFlow?

We test the PeopleSegNet example and it can work correctly.

Thanks.

I trained it using the default method as prescribed using Matterport’s method and not using TLT

Hi,

Could you check if you can get a correct output with TensorRT first?
This will help us to narrow down the issue is from TensorRT or Deepstream.

Thanks.

TensorRT gives desired output as I perform them in this colab notebook

I use the sample_uff_maskRCNN of TRT 7.0. I have tested this on host as well as the Jetson Xavier.

Hi,

Thanks for all the confirm and testing.

we are going to reproduce this issue internally.
Will get back to you later.

1 Like

Hi,

Thanks for your patience.

Please use a specified example for MaskRCNN instead.
Confirmed that we can get the mask output with the res101-holygrail-ep26.uff model.

1. Get source

$ export DS_SRC_PATH=/opt/nvidia/deepstream/deepstream-6.0/
$ git clone https://github.com/NVIDIA-AI-IOT/deepstream_4.x_apps.git

2. Apply change

diff --git a/Makefile b/Makefile
index 80e6502..f11bfac 100644
--- a/Makefile
+++ b/Makefile
@@ -13,7 +13,7 @@ APP:= deepstream-custom
 
 TARGET_DEVICE = $(shell gcc -dumpmachine | cut -f1 -d -)
 
-NVDS_VERSION:=4.0
+NVDS_VERSION:=6.0
 
 LIB_INSTALL_DIR?=/opt/nvidia/deepstream/deepstream-$(NVDS_VERSION)/lib/
 
diff --git a/nvdsinfer_customparser_mrcnn_uff/nvdsinfer_custombboxparser_mrcnn_uff.cpp b/nvdsinfer_customparser_mrcnn_uff/nvdsinfer_custombboxparser_mrcnn_uff.cpp
index d8ac0d4..90ceab6 100644
--- a/nvdsinfer_customparser_mrcnn_uff/nvdsinfer_custombboxparser_mrcnn_uff.cpp
+++ b/nvdsinfer_customparser_mrcnn_uff/nvdsinfer_custombboxparser_mrcnn_uff.cpp
@@ -28,7 +28,7 @@ static const int DETECTION_MAX_INSTANCES = 100;
 static const int NUM_CLASSES = 1 + 80; // COCO has 80 classes
 
 static const int MASK_POOL_SIZE = 14;
-static const nvinfer1::DimsCHW INPUT_SHAPE{3, 1024, 1024};
+static const nvinfer1::Dims3 INPUT_SHAPE{3, 1024, 1024};
 //static const Dims2 MODEL_DETECTION_SHAPE{DETECTION_MAX_INSTANCES, 6};
 //static const Dims4 MODEL_MASK_SHAPE{DETECTION_MAX_INSTANCES, NUM_CLASSES, 28, 28};
 
diff --git a/pgie_mrcnn_uff_config.txt b/pgie_mrcnn_uff_config.txt
index b169d1d..5422121 100644
--- a/pgie_mrcnn_uff_config.txt
+++ b/pgie_mrcnn_uff_config.txt
@@ -50,7 +50,7 @@ offsets=103.939;116.779;123.68
 model-color-format=1
 labelfile-path=./nvdsinfer_customparser_mrcnn_uff/mrcnn_labels.txt
 uff-file=./mrcnn_nchw.uff
-model-engine-file=./mrcnn_nchw.uff_b1_fp32.engine
+model-engine-file=./mrcnn_nchw.uff_b1_gpu0_fp32.engine
 uff-input-dims=3;1024;1024;0
 uff-input-blob-name=input_image
 batch-size=1

3. Compile and Run

$ cd deepstream_4.x_apps/nvdsinfer_customparser_mrcnn_uff/
$ CUDA_VER=10.2 make
$ cd ../
$ make
$ cp {res101-holygrail-ep26.uff} mrcnn_nchw.uff
$ ./deepstream-custom pgie_mrcnn_uff_config.txt /opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_720p.h264 

Thanks.

Is it possible to use an image instead of the h264 video file ?

Hi,

Image has a different decoder compared to the video.
You can find an example in the below folder:

/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-image-decode-test

Thanks.