• Hardware Platform (Jetson / GPU)
nvidia GPU
• DeepStream Version
5.0.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
7.0.0
• NVIDIA GPU Driver Version (valid for GPU only)
460.39
• Issue Type( questions, new requirements, bugs)
question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
follow the steps I describe
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
Hello everyone.
I’m posting this as an extension from my previous post:
https://forums.developer.nvidia.com/t/little-to-no-detection-using-tlt-faster-rcnn-trained-model-on-deepstream-app/174802/13
This time I trained a custom Fastest-RCNN on TLT using my own 500-images dataset.
The final goal is to run this model on deepstream-app.
I followed almost the same steps form faster_rcnn on TLT example to train the model and got one with good metrics and detection.
==========================================================================================
Class AP precision recall RPN_recall
------------------------------------------------------------------------------------------
person 0.8522 0.1018 0.9583 0.9583
------------------------------------------------------------------------------------------
person-helmet 0.9124 0.0946 0.9545 0.9091
------------------------------------------------------------------------------------------
truck-front 1.0000 0.1260 1.0000 1.0000
------------------------------------------------------------------------------------------
truck-tank 1.0000 0.3946 1.0000 1.0000
------------------------------------------------------------------------------------------
truck-tanker 1.0000 0.1442 1.0000 1.0000
------------------------------------------------------------------------------------------
mAP@0.5 = 0.9529
When I use infer on the model, I’m getting these results.
Which are totally expected (note that the model not detecting the occluded trucks on the background is intended).
When I export the model to deepstream, is another stroy. There is almost no detection, nor visible at the output nor dumped into kitti output.
This is my specs file:
(I haven’t pruned and retrained yet, so I’ll post only this specs file)
random_seed: 42
enc_key: <key>
verbose: True
model_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
size_height_width {
height: 704
width: 1260
}
image_channel_mean {
key: 'b'
value: 103.939
}
image_channel_mean {
key: 'g'
value: 116.779
}
image_channel_mean {
key: 'r'
value: 123.68
}
image_scaling_factor: 1.0
max_objects_num_per_image: 100
}
arch: "resnet:18"
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
#roi_mini_batch: 256
roi_mini_batch: 32
rpn_stride: 16
use_bias: False
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
dataset_config {
data_sources: {
tfrecords_path: "/workspace/tlt-experiments/faster_rcnn-pfuenzalida/tfrecords/kitti_trainval/kitti_trainval*"
image_directory_path: "/workspace/tlt-experiments/data/training"
}
image_extension: 'png'
target_class_mapping {
key: 'car'
value: 'car'
}
target_class_mapping {
key: 'van'
value: 'car'
}
target_class_mapping {
key: 'pedestrian'
value: 'person'
}
target_class_mapping {
key: 'person_sitting'
value: 'person'
}
target_class_mapping {
key: 'cyclist'
value: 'cyclist'
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 1260
output_image_height: 704
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
training_config {
enable_augmentation: True
enable_qat: False
#batch_size_per_gpu: 8
batch_size_per_gpu: 2
num_epochs: 12
retrain_pruned_model: "/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/model_1_pruned.tlt"
output_model: "/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/frcnn_kitti_resnet18_retrain.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}
#rpn_mini_batch: 256
rpn_mini_batch: 32
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7
regularizer {
type: L2
weight: 1e-4
}
optimizer {
sgd {
lr: 0.02
momentum: 0.9
decay: 0.0
nesterov: False
}
}
learning_rate {
soft_start {
base_lr: 0.02
start_lr: 0.002
soft_start: 0.1
annealing_points: 0.8
annealing_points: 0.9
annealing_divider: 10.0
}
}
lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0
}
inference_config {
images_dir: '/workspace/tlt-experiments/data/testing/image_2'
#images_dir: '/workspace/tlt-experiments/data/customVal'
model: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/frcnn_kitti_resnet18_retrain.epoch12.tlt'
batch_size: 1
detection_image_output_dir: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/inference_results_imgs_retrain'
labels_dump_dir: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/inference_dump_labels_retrain'
#detection_image_output_dir: '/workspace/tlt-experiments/data/customValResults/inference_results_imgs_retrain'
#labels_dump_dir: '/workspace/tlt-experiments/data/customValResults/inference_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
object_confidence_thres: 0.0001
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 100
classifier_nms_overlap_threshold: 0.3
#trt_inference {
#trt_engine: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/trt.int8.engine'
#trt_data_type: 'int8'
#max_workspace_size_MB: 2000
#}
}
evaluation_config {
model: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/frcnn_kitti_resnet18_retrain.epoch12.tlt'
batch_size: 1
validation_period_during_training: 1
labels_dump_dir: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/test_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 100
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
#trt_evaluation {
#trt_engine: '/workspace/tlt-experiments/faster_rcnn-pfuenzalida/data/faster_rcnn/trt.int8.engine'
#trt_data_type: 'int8'
#max_workspace_size_MB: 2000
#}
gt_matching_iou_threshold: 0.5
}
To use the model with deepstream I exported the last epoch of the trained model to etlt on fp32.
This is how I run the deepstream-app
deepstream-app -c deepstream_app_config_fasterRCNN.txt
my deepstream_app_config_fasterRCNN.txt file is the following
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1
gie-kitti-output-dir=./
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
num-sources=1
uri=file:/home/user/dev/nvidia/samples/streams/sample_1080p_h264.mp4
gpu-id=0
cudadec-memtype=0
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=1
bitrate=3000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
# set below properties in case of RTSPStreaming
rtsp-port=8555
udp-port=5400
[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
[primary-gie]
enable=1
gpu-id=0
batch-size=1
gie-unique-id=1
interval=1
config-file=config_infer_primary_frcnn_infer_2_ds-app.txt
nvbuf-memory-type=0
my config_infer_primary_frcnn_infer_2_ds-app.txt is
[property]
gpu-id=0
net-scale-factor=1.0
offsets=103.939;116.779;123.68
model-color-format=1
labelfile-path=frcnn_labels.txt
tlt-encoded-model=../models/frcnn_kitti_resnet18.etlt
tlt-model-key=<key>
model-engine-file=../models/frcnn_kitti_resnet18.etlt_b1_gpu0_fp32.engine
infer-dims=3;544;960
uff-input-order=0
uff-input-blob-name=input_image
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
num-detected-classes=5
interval=0
gie-unique-id=1
is-classifier=0
#network-type=0
output-blob-names=NMS
cluster-mode=2
parse-bbox-func-name=NvDsInferParseCustomNMSTLT
custom-lib-path=/home/user/dev/nvidia/proyectos/ejemplos/deepstream_tlt_apps/post_processor/libnvds_infercustomparser_tlt.so
[class-attrs-all]
pre-cluster-threshold=0.01
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0
and my frcnn_labels.txt file is
person
person-helmet
truck-front
truck-tank
truck-tanker
The custom parser was obtained following this git
https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps
and I’m running TLT and Deepstream on these docker containers.
nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
nvcr.io/nvidia/deepstream:5.0.1-20.09-devel
Now, I have tried with more than one custom rcnn parser, and I haven’t been able to get good results. The apps run with no errors, but detection is noticibly inferior on deepstream-app than on TLT, now I need help figuring this out.
Thank you and sorry for the long post.