Details on cost_function_config for PeopleNet

Hi everyone,

The TLT link (Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation) suggests that the parameters for the cost function remain unchanged. However, it seems that the parameters are specific only to car, cyclist, and pedestrian classes.

I am training a PeopleNet model on my custom dataset with only person and face.

  1. I extracted the occurrences of each classes in my whole training and validation data and the number of person class is almost THRICE the number of face class. Is it correct if I set class weight of face to be 3 and person to be 1?

  2. How do we tweak coverage_foreground_weight, and the objectives (cov and bbox) initial and target weight? What do these initial and target weight mean? I am getting 70+ precision on face but terrible precision of about low 20s on person.

random_seed: 42
model_config {
pretrained_model_file: “/workspace/Script/Pretrained_Weights/resnet18_peoplenet.tlt”

arch: “resnet”
num_layers: 18

use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
}

dataset_config {
data_sources: {
tfrecords_path: “/workspace/Script/TFRecords/*”
image_directory_path: “/workspace/Script/Data/”
}
image_extension: “jpg”
target_class_mapping {
key: “person”
value: “person”
}

target_class_mapping {
key: “bag”
value: “bag”
}

target_class_mapping {
key: “face”
value: “face”
}
validation_fold: 0
}

training_config {
batch_size_per_gpu: 16
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 0.0005
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-09
}
optimizer {
adam {
epsilon: 9.9e-09
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 5
}

bbox_rasterizer_config {
target_class_config {
key: “person”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
target_class_config {
key: “bag”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}

target_class_config {
key: “face”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3

}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.5
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}

postprocessing_config{
target_class_config{
key: “person”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.001
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “bag”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “face”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 2
}
}
}
}

evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 20
minimum_detection_ground_truth_overlap {
key: “bag”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “face”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “person”
value: 0.5
}
evaluation_box_config {
key: “bag”
value {
minimum_height: 40
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: “face”
value {
minimum_height: 2
maximum_height: 9999
minimum_width: 2
maximum_width: 9999
}
}
evaluation_box_config {
key: “person”
value {
minimum_height: 40
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
}

cost_function_config {
target_classes {
name: “person”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “face”
class_weight: 3.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}

target_classes {
name: “bag”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}

Refer to
https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/text/faqs.html

Distribute the dataset class: How do I balance the weight between classes if the dataset has significantly higher samples for one class versus another?

To account for imbalance, increase the class_weight for classes with fewer samples. You can also try disabling enable_autoweighting; in this case initial_weight is used to control cov/regression weighting. It is important to keep the number of samples of different classes balanced, which helps improve mAP.