secondary-gie malfunctionality

Hi every one.

I am developing an app using Deepstream sdk4. I have a single image of a vehicle’s front view, and by reapeating that single image I can make a 30 second video stream. my goal is to use yolov2-tiny and yolov3-tiny to consecutively detect license plate, and then find all characters in that license plate. so, I use the following setup:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=4

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
#uri=file:/opt/nvidia/deepstream/deepstream-4.0/samples/streams/out1.h264
uri=file:/opt/nvidia/deepstream/deepstream-4.0/samples/streams/Grill-Haydari1519.h264
num-sources=1
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.

[primary-gie]
enable=1
gpu-id=0
model-engine-file=model_b1_fp16_plate_alone.engine
labelfile-path=Labels_plate_alone.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;1;1
bbox-border-color1=0;0;1;1
bbox-border-color2=0;1;1;1
bbox-border-color3=0;1;1;1
gie-unique-id=2
#operate-on-gie-id=1
#process-mode=2
#gie-mode=2
#is-classifier=1
#classifier-async-mode=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2_tiny_plate_alone.txt

[secondary-gie]
enable=1
gpu-id=0
[b]model-engine-file=model_b1_fp32_ocr.engine
[/b]labelfile-path=labels_ocr.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;1;1
bbox-border-color1=0;0;1;1
bbox-border-color2=0;1;1;1
bbox-border-color3=0;1;1;1

gie-unique-id=3
operate-on-gie-id=2
#process-mode=2
#gie-mode=2
#is-classifier=1
#classifier-async-mode=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV3_tiny.txt

[ds-example]
enable=1
processing-width=10
processing-height=10
full-frame=0
unique-id=15

gpu-id=0

[tests]
file-loop=0

the setup works well, it detects the license plate and draws a bounding box around it, it then finds the characters. however it miss-classifies the characters as shown in the image below:

https://drive.google.com/open?id=1KOBvgk0fltikCW7GH_44qEYSRGlyyvBG

only the numbers “2” and “7” have been correctly classified, and the result is frustrating! (in case You are not familiar with Arabic numbers, they are all correctly classified in the next image! )

now I test another scenario. by modifying GstDsExample, In the above setup, I save
the detected license plate (the license plate that was detected by the yolov2-tiny network in the video of the car’s front view image ) as a jpg file. then I make a 30 second video of that by repeating that single image in multiple frames. so, now I have a 30 second video of a license plate.

then, I use the below setup to detect characters in this video of license plate, and more importantly is to note that I use exactly the same yolov3-tiny network that I used in the prevoius setup:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720

#width=640
#height=480

gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3

uri=file://../../samples/streams/image006.h264

#uri=file://../../samples/streams/out1.h264
num-sources=1
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.

[primary-gie]
enable=1
gpu-id=0
<b>model-engine-file=model_b1_fp32_ocr.engine</b>
labelfile-path=labels_ocr.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1

nvbuf-memory-type=0
config-file=config_infer_primary_yoloV3_tiny.txt

[tests]
file-loop=0

however, this time all characters (all numbers that we trained the network on)are correctly detected as shown in the following image:

https://drive.google.com/open?id=1sj3NUy0NW3hgJAZaBQNhb7H48x1H5vW8

so basically, it seems like the process of cascading two classifiers in deepstream sdk is affecting performance!
In my first setup I detected license plate using yolov2-tiny as primary gie, and then I
detected characters in it using yolov3-tiny as secondary gie and the characters were classified with a high error!
in the second scenario I used the same yolov3-tiny for character recognition, and the license plate used was the one detected in the first setup (I just saved the detected plate by modifying gstdsexample and made a video out of it); however, this time all the characters were correctly classified!

my question precisely is "why this happens and how should I solve this so that my secondary classifier for character recognition can work well when used in cascade?

thanks

it seems like the secondary gie might be doing some kind of scaling or normalization or other kinds pre-processings that need to be disabled!

I checked yolov2-tiny and yolov3-tiny network input resolution, they are same, 416*416

The Gst-nvinfer plugin performs transforms (format conversion and scaling), on the input frame based on network requirements, and passes the transformed data to the low-level library.

so as your first case second nvinfer should not involve transform.
btw, both yolov2 tiny and yolov3 tiny are detector network as i know.

your case is simiar with our BACK-TO-BACK-DETECTORS REFERENCE APP,
can you refer https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/back-to-back-detectors
to customize your case accordinly to see if have improvements?

Hi,
In your first test, could you test again with “process-mode=2” in [secondary-gie] uncommented, otherwise seems the seconday gie doing the classification based on the full frame instead of the objects detected by the first gie.

Reference DS nvinfer source code gstnvinfer_property_parser.cpp and gstnvinfer.cpp under
/opt/nvidia/deepstream/deepstream-4.0/sources/gst-plugins/gst-nvinfer/.

Dear All, this has been solved.

I had to uncomment and manually set input-object-min-width, and input-object-min-height in the config file of my secondary classifier. it seems like if the width and height of a detection window( from primary gie) is smaller than input-object-min-width and input-object-min-height of the secondary gie, then it will not be passed through the secondary gie.

thank you for Your help