Little to no detection on Deepstream-App compared to TLT's infer using the same model

ai12 · April 14, 2021, 5:33pm

• Hardware Platform (Jetson / GPU)
nvidia GPU
• DeepStream Version
5.0.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
7.0.0
• NVIDIA GPU Driver Version (valid for GPU only)
460.39
• Issue Type( questions, new requirements, bugs)
question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
follow the steps I describe
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello everyone.
I’m posting this as an extension from my previous post:
https://forums.developer.nvidia.com/t/little-to-no-detection-using-tlt-faster-rcnn-trained-model-on-deepstream-app/174802/13

This time I trained a custom Fastest-RCNN on TLT using my own 500-images dataset.
The final goal is to run this model on deepstream-app.

I followed almost the same steps form faster_rcnn on TLT example to train the model and got one with good metrics and detection.

==========================================================================================
Class               AP                  precision           recall              RPN_recall          
------------------------------------------------------------------------------------------
person              0.8522              0.1018              0.9583              0.9583              
------------------------------------------------------------------------------------------
person-helmet       0.9124              0.0946              0.9545              0.9091              
------------------------------------------------------------------------------------------
truck-front         1.0000              0.1260              1.0000              1.0000              
------------------------------------------------------------------------------------------
truck-tank          1.0000              0.3946              1.0000              1.0000              
------------------------------------------------------------------------------------------
truck-tanker        1.0000              0.1442              1.0000              1.0000              
------------------------------------------------------------------------------------------
mAP@0.5 = 0.9529

When I use infer on the model, I’m getting these results.

Which are totally expected (note that the model not detecting the occluded trucks on the background is intended).

When I export the model to deepstream, is another stroy. There is almost no detection, nor visible at the output nor dumped into kitti output.

This is my specs file:
(I haven’t pruned and retrained yet, so I’ll post only this specs file)

random_seed: 42
enc_key: <key>
verbose: True
model_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
size_height_width {
height: 704
width: 1260
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
image_scaling_factor: 1.0
max_objects_num_per_image: 100
}
arch: "resnet:18"
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
#roi_mini_batch: 256
roi_mini_batch: 32
rpn_stride: 16
use_bias: False
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tlt-experiments/faster_rcnn-pfuenzalida/tfrecords/kitti_trainval/kitti_trainval*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
image_extension: 'png'
target_class_mapping {
key: 'car'
value: 'car'
}
target_class_mapping {
key: 'van'
value: 'car'
}
target_class_mapping {
key: 'pedestrian'
value: 'person'
}
target_class_mapping {
key: 'person_sitting'
value: 'person'
}
target_class_mapping {
key: 'cyclist'
value: 'cyclist'
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 1260
output_image_height: 704
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
training_config {
enable_augmentation: True
enable_qat: False
#batch_size_per_gpu: 8
batch_size_per_gpu: 2
num_epochs: 12
retrain_pruned_model: "/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/model_1_pruned.tlt"
output_model: "/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/frcnn_kitti_resnet18_retrain.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

#rpn_mini_batch: 256
rpn_mini_batch: 32
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

regularizer {
type: L2
weight: 1e-4
}

optimizer {
sgd {
lr: 0.02
momentum: 0.9
decay: 0.0
nesterov: False
}
}

learning_rate {
soft_start {
base_lr: 0.02
start_lr: 0.002
soft_start: 0.1
annealing_points: 0.8
annealing_points: 0.9
annealing_divider: 10.0
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0
}
inference_config {
images_dir: '/workspace/tlt-experiments/data/testing/image_2'
#images_dir: '/workspace/tlt-experiments/data/customVal'
model: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/frcnn_kitti_resnet18_retrain.epoch12.tlt'
batch_size: 1
detection_image_output_dir: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/inference_results_imgs_retrain'
labels_dump_dir: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/inference_dump_labels_retrain'
#detection_image_output_dir: '/workspace/tlt-experiments/data/customValResults/inference_results_imgs_retrain'
#labels_dump_dir: '/workspace/tlt-experiments/data/customValResults/inference_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
object_confidence_thres: 0.0001
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 100
classifier_nms_overlap_threshold: 0.3
#trt_inference {
#trt_engine: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/trt.int8.engine'
#trt_data_type: 'int8'
#max_workspace_size_MB: 2000
#}
}
evaluation_config {
model: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/frcnn_kitti_resnet18_retrain.epoch12.tlt'
batch_size: 1
validation_period_during_training: 1
labels_dump_dir: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/test_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 100
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
#trt_evaluation {
#trt_engine: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/trt.int8.engine'
#trt_data_type: 'int8'
#max_workspace_size_MB: 2000
#}
gt_matching_iou_threshold: 0.5
}

To use the model with deepstream I exported the last epoch of the trained model to etlt on fp32.
This is how I run the deepstream-app

deepstream-app -c deepstream_app_config_fasterRCNN.txt

my deepstream_app_config_fasterRCNN.txt file is the following

    [application]
    enable-perf-measurement=1
    perf-measurement-interval-sec=1
    gie-kitti-output-dir=./

    [source0]
    enable=1
    #Type - 1=CameraV4L2 2=URI 3=MultiURI
    type=3
    num-sources=1
    uri=file:/home/user/dev/nvidia/samples/streams/sample_1080p_h264.mp4
    gpu-id=0
    cudadec-memtype=0

    [sink0]
    enable=1
    #Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
    type=4
    #1=h264 2=h265
    codec=1
    #encoder type 0=Hardware 1=Software
    enc-type=0
    sync=1
    bitrate=3000000
    #H264 Profile - 0=Baseline 2=Main 4=High
    #H265 Profile - 0=Main 1=Main10
    profile=0
    # set below properties in case of RTSPStreaming  
    rtsp-port=8555
    udp-port=5400

    [osd]
    enable=1
    gpu-id=0
    border-width=3
    text-size=15
    text-color=1;1;1;1;
    text-bg-color=0.3;0.3;0.3;1
    font=Serif
    show-clock=0
    clock-x-offset=800
    clock-y-offset=820
    clock-text-size=12
    clock-color=1;0;0;0
    nvbuf-memory-type=0

    [primary-gie]
    enable=1
    gpu-id=0
    batch-size=1
    gie-unique-id=1
    interval=1
    config-file=config_infer_primary_frcnn_infer_2_ds-app.txt
    nvbuf-memory-type=0

my config_infer_primary_frcnn_infer_2_ds-app.txt is

    [property]
    gpu-id=0
    net-scale-factor=1.0
    offsets=103.939;116.779;123.68
    model-color-format=1
    labelfile-path=frcnn_labels.txt
    tlt-encoded-model=../models/frcnn_kitti_resnet18.etlt
    tlt-model-key=<key>
    model-engine-file=../models/frcnn_kitti_resnet18.etlt_b1_gpu0_fp32.engine
    infer-dims=3;544;960
    uff-input-order=0
    uff-input-blob-name=input_image
    batch-size=1
    ## 0=FP32, 1=INT8, 2=FP16 mode
    network-mode=0
    num-detected-classes=5
    interval=0
    gie-unique-id=1
    is-classifier=0
    #network-type=0
    output-blob-names=NMS
    cluster-mode=2
    parse-bbox-func-name=NvDsInferParseCustomNMSTLT
    custom-lib-path=/home/user/dev/nvidia/proyectos/ejemplos/deepstream_tlt_apps/post_processor/libnvds_infercustomparser_tlt.so

    [class-attrs-all]
    pre-cluster-threshold=0.01
    roi-top-offset=0
    roi-bottom-offset=0
    detected-min-w=0
    detected-min-h=0
    detected-max-w=0
    detected-max-h=0

and my frcnn_labels.txt file is

person
person-helmet
truck-front
truck-tank
truck-tanker

The custom parser was obtained following this git
https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps

and I’m running TLT and Deepstream on these docker containers.
nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
nvcr.io/nvidia/deepstream:5.0.1-20.09-devel

Now, I have tried with more than one custom rcnn parser, and I haven’t been able to get good results. The apps run with no errors, but detection is noticibly inferior on deepstream-app than on TLT, now I need help figuring this out.

Thank you and sorry for the long post.

Morganh · April 15, 2021, 2:09am

As we sync in Little to no detection using TLT Faster-RCNN trained model on Deepstream-App

Please follow github GitHub - NVIDIA-AI-IOT/deepstream_tlt_apps: Sample apps to demonstrate how to deploy models trained with TLT on DeepStream to run , and run with below.

For detection model:
Usage: ds-tlt -c pgie_config_file -i <H264 or JPEG filename> [-b BATCH] [-d]

ai12 · April 15, 2021, 2:55am

Hi @Morganh, thank you for your time.

I will do it first time tomorrow (here in Chile is almost midnight).
Currently I´m training a yolo_v4 net and will comment the results for that model too.

In case it is helpfull, this is my trained network

and this is a video sample I expect to detect on

Could you tell me, if it is not too much to ask of course, if you can run this model on deepstream-app and get expected results?
Sorry for not sharing with you the etlt and key, but I´m not allowed to do so.
Thanks again for all your help and support.

Morganh · April 15, 2021, 3:05am

No worried. Take your time.
TLT user guide recommends below github for inference.
See FasterRCNN — Transfer Learning Toolkit 3.0 documentation

A DeepStream sample with documentation on how to run inference using the trained FasterRCNN models from TLT is provided on GitHub here.

ai12 · April 19, 2021, 1:43pm

Hello @Morganh
I found out what I was missing.
On motropolis’s docs, there is a final step on Tensor OSS’s install asking to replace the original libnvinfer_plugin.so with the generated at the end of Tensor OSS’s compilation.
After doing that I got the expected results.

sudo mv /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.x.y ${HOME}/libnvinfer_plugin.so.7.x.y.bak   // backup original libnvinfer_plugin.so.x.y
sudo cp `pwd`/out/libnvinfer_plugin.so.7.m.n  /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.x.y
sudo ldconfig

Thank you again for all your help.
I got Faster RCNN and YOLO_V4 running propperly on my deepstream app.

Morganh · April 19, 2021, 2:41pm

Thanks for the info. Great job!

Topic		Replies	Views
Little to no detection using TLT Faster-RCNN trained model on Deepstream-App TAO Toolkit	13	1127	October 12, 2021
TLT YOLO v3 model cannot detect anything in Deepstream 5.0, JetPack 4.4 DeepStream SDK	3	632	October 12, 2021
Deepstream v5 unexpected realtime results of models TAO Toolkit	2	424	October 12, 2021
Issue with image classification tutorial and testing with deepstream-app TAO Toolkit tensorrt , jetson-inference	34	5871	October 12, 2021
Lack of FPS after successfully deploy TLT to Deepstream. DeepStream SDK	18	1015	April 27, 2020
Tlt3.0 train yolov4 of resnet10, "tlt yolo_v4 inference" could get right bboxes, but deepstream5.1 get wrong result TAO Toolkit	9	688	October 12, 2021
Can't get TLT trained model get to work on Deepstream - Jetson (NX) DeepStream SDK	4	1208	October 12, 2021
Can't get TLT trained model get to work on Deepstream - Jetson (NX) TAO Toolkit	2	851	October 12, 2021
Getting erroneous detection with TLT trained model deployed while testing with deepstream DeepStream SDK	5	988	October 12, 2021
transfert learning toolkit-> export model TAO Toolkit	11	3612	October 12, 2021

Little to no detection on Deepstream-App compared to TLT's infer using the same model

Related topics