I am trying to improve the performance of the PeopleNet model using around 1300 labeled 1920x1080 png images.
I have used the following command
tlt-train detectnet_v2 -k tlt_encode -r /workspace/tlt-experiments/ -e train.txt
My train.txt file is:
random_seed: 42
model_config {
num_layers: 18
pretrained_model_file: "/workspace/tlt-experiments/resnet34_peoplenet.tlt"
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: "resnet"
all_projections: true
}
# Sample rasterizer configs to instantiate a 3 class bbox rasterizer
bbox_rasterizer_config {
target_class_config {
key: "person"
value: {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.67
}
postprocessing_config {
target_class_config {
key: "person"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 20
}
}
}
}
cost_function_config {
target_classes {
name: "person"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: True
max_objective_weight: 0.9999
min_objective_weight: 0.0001
}
training_config {
batch_size_per_gpu: 8
num_epochs: 80
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-9
}
optimizer {
adam {
epsilon: 1e-08
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
enabled: False
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
}
# Sample augementation config for
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}
evaluation_config {
average_precision_mode: INTEGRATE
validation_period_during_training: 10
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: "person"
value: 0.5
}
evaluation_box_config {
key: "person"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
}
dataset_config {
data_sources: {
tfrecords_path: "/workspace/tlt-experiments/tf_records/*"
image_directory_path: "/workspace/tlt-experiments/"
}
image_extension: "png"
target_class_mapping {
key: "person"
value: "person"
}
validation_fold: 0
}
The results of training at 80 epochs are:
Epoch 80/80
=========================
Validation cost: 0.000043
Mean average_precision (in %): 98.6850
class name average precision (in %)
------------ --------------------------
person 98.685
Median Inference Time: 0.013576
Understand that for deployment I would use prune but just wanted to check accuracy on site camera so used the below to export the model for deepstream 5.0dp:
tlt-export detectnet_v2 -m /workspace/tlt-experiments/weights/model.tlt -o /workspace/tlt-experiments/weights/peoplenet_detector_unpruned.etlt -k tlt_encode
I get the following:
Using TensorFlow backend.
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
DEBUG [/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['output_cov/Sigmoid', 'output_bbox/BiasAdd'] as outputs
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 2 output network tensors.
Then I use the same deepstream code that was running the standard PeopleNet model but with changes as shown below:
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=tlt_encode
tlt-encoded-model=/home/ddi/Social%20Distancing/CampsieRSL/deepstream/dev/local-testing/peoplenet_detector_unpruned.etlt
#tlt-encoded-model=/opt/nvidia/deepstream/deepstream-5.0/samples/models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt
labelfile-path=labels_peoplenet.txt
#model-engine-file=/opt/nvidia/deepstream/deepstream-5.0/samples/models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine
input-dims=3;544;960;0
uff-input-blob-name=input_1
batch-size=1
process-mode=1
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=1
cluster-mode=1
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
[class-attrs-all]
pre-cluster-threshold=0.2
## Set eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
eps=0.7
minBoxes=1
Have changed the number of classes to just 1 for person, label file is just person now. When I run the deepstream app it does not detect a single person unless I put pre-cluser-threshold<0.1 which gives mostly false positives.
Have I missed something? Does it matter that I am only using 1 class? Does the traning image need to be 960x544 and not 1920x1080?