PeopleNet precision low for person class

Hi all,

My results for PeopleNet model is

Validation cost: 0.000908
Mean average_precision (in %): 32.6038

class name average precision (in %)


bag 0
face 69.894
person 27.9174

How can I increase the precision further, especially on person class?

Will retraining the output model on poor predictions (predictions with false positives) be a good way to significantly increase precision?
If so, should I freeze blocks and reduce max learning rate during the retraining process?

Also, will adding training images with solely background with no person and face and labelling all my data with ‘bag’ that covers the whole image be helpful?
Can the label file for the background images be an empty txt file with no labels inside?

random_seed: 42
model_config {
pretrained_model_file: “/workspace/Script/Pretrained_Weights/resnet18_peoplenet.tlt”

arch: “resnet”
num_layers: 18

use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
}

dataset_config {
data_sources: {
tfrecords_path: “/workspace/Script/TFRecords/*”
image_directory_path: “/workspace/Script/Data/”
}
image_extension: “jpg”
target_class_mapping {
key: “person”
value: “person”
}

target_class_mapping {
key: “bag”
value: “bag”
}

target_class_mapping {
key: “face”
value: “face”
}
validation_fold: 0
}

training_config {
batch_size_per_gpu: 16
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 0.0005
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-09
}
optimizer {
adam {
epsilon: 9.9e-09
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 5
}

bbox_rasterizer_config {
target_class_config {
key: “person”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
target_class_config {
key: “bag”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}

target_class_config {
key: “face”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3

}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.5
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}

postprocessing_config{
target_class_config{
key: “person”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.001
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “bag”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “face”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 2
}
}
}
}

evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 20
minimum_detection_ground_truth_overlap {
key: “bag”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “face”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “person”
value: 0.5
}
evaluation_box_config {
key: “bag”
value {
minimum_height: 40
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: “face”
value {
minimum_height: 2
maximum_height: 9999
minimum_width: 2
maximum_width: 9999
}
}
evaluation_box_config {
key: “person”
value {
minimum_height: 40
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
}

cost_function_config {
target_classes {
name: “person”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “face”
class_weight: 3.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}

target_classes {
name: “bag”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406

Firstly, please run tlt-evaluate against your val dataset with the ngc pre-trained model(resnet18_peoplenet.tlt) you have downloaded, to check what the mAP result is.
Please refer to the spec in PeopleNet v1.0 unpruned model shows very bad results on COCO dataset - #11 by Morganh (especially pay attention to load_graph, it should be True, and for resnet34 model, the num_layers should be 34) and some comments in People Net - - #4 by ishan

is the load_graph model supposed to always be set to true for training the NGC pretrained model on my custom dataset?

For detectnet_v2, it is important to set the load_graph under model_config to true to import the pruned graph.

But what I mention above is just let you run tlt-evaluate directly with the ngc pre-triained model.
With the spec in PeopleNet v1.0 unpruned model shows very bad results on COCO dataset - #11 by Morganh, it will directly load the ngc pretrianed model and run a quick validation against your own dataset.

I want to know, with the default ngc model, how about the mAP?
Then, you can run the transfer learning further.

Hi, if i do not want to train the pruned graph, i can leave it as False right? My plan is to train the unpruned version untill the mAP is satisfactory then prune it for deployment.

I have evaluated the pretrained model from ngc. these are the results on my whole dataset

Validation cost: 0.005483
Mean average_precision (in %): 7.3567

class name average precision (in %)


bag 0
face 0.133323
person 21.9367

Your evaluation result on your own dataset is poor. Can you attach your spec when you evaluated the pretrained model from ngc?

model_config {
pretrained_model_file: “/workspace/Script/Pretrained_Weights/resnet18_peoplenet.tlt”

freeze_blocks: 0
freeze_blocks: 1
freeze_blocks: 2
freeze_blocks: 3

arch: “resnet”
num_layers: 18
load_graph: True
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
}

dataset_config {
data_sources: {
tfrecords_path: “/workspace/Script/TFRecords/*”
image_directory_path: “/workspace/Script/Data/”
}
image_extension: “jpg”
target_class_mapping {
key: “person”
value: “person”
}

target_class_mapping {
key: “bag”
value: “bag”
}

target_class_mapping {
key: “face”
value: “face”
}
validation_fold: 0
}

training_config {
batch_size_per_gpu: 24
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 0.0005
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-09
}
optimizer {
adam {
epsilon: 9.9e-09
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}

bbox_rasterizer_config {
target_class_config {
key: “person”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “bag”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}

target_class_config {
key: “face”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3

}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}

postprocessing_config{
target_class_config{
key: “person”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.001
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “bag”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “face”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 2
}
}
}
}

evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “bag”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “face”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “person”
value: 0.5
}
evaluation_box_config {
key: “bag”
value {
minimum_height: 40
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: “face”
value {
minimum_height: 2
maximum_height: 9999
minimum_width: 2
maximum_width: 9999
}
}
evaluation_box_config {
key: “person”
value {
minimum_height: 40
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
}

cost_function_config {
target_classes {
name: “person”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “face”
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}

target_classes {
name: “bag”
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}

Please modify to
min_learning_rate: 10e-10
max_learning_rate: 10e-10
refer to PeopleNet v1.0 unpruned model shows very bad results on COCO dataset - #11 by Morganh

That spec will load the ngc pretrained model directly and run evaluation.

It is now as follows

Validation cost: 0.007174
Mean average_precision (in %): nan

class name average precision (in %)


bag nan
face 0.028404
person 31.2815

random_seed: 42
model_config {
pretrained_model_file: “/workspace/Script/Pretrained_Weights/resnet18_peoplenet.tlt”

arch: “resnet”
num_layers: 18
load_graph: True
use_batch_norm: False
activation {
activation_type: “relu”
}
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
}

dataset_config {
data_sources: {
tfrecords_path: “/workspace/Script/TFRecords/*”
image_directory_path: “/workspace/Script/Data/”
}
image_extension: “jpg”
target_class_mapping {
key: “person”
value: “person”
}

target_class_mapping {
key: “bag”
value: “bag”
}

target_class_mapping {
key: “face”
value: “face”
}
validation_fold: 0
}

training_config {
batch_size_per_gpu: 16
num_epochs: 10
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 10e-10
max_learning_rate: 10e-10
soft_start: 0.0
annealing: 0.3
}
}
regularizer {
type: L1
weight: 3e-09
}
optimizer {
adam {
epsilon: 9.9e-09
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}

bbox_rasterizer_config {
target_class_config {
key: “person”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
target_class_config {
key: “bag”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}

target_class_config {
key: “face”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3

}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}

postprocessing_config{
target_class_config{
key: “person”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.2
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “bag”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “face”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
}

evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 20
minimum_detection_ground_truth_overlap {
key: “bag”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “face”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “person”
value: 0.7
}
evaluation_box_config {
key: “bag”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “face”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “person”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}

cost_function_config {
target_classes {
name: “person”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “face”
class_weight: 2.6
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}

target_classes {
name: “bag”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}

So, seems the baseline is not good. But I suggest you to double check your data and spec.

  1. Are your dataset resized to 960x544, and resize the labels accordingly?
  2. Forget to mention, can you set below value to uppercase and rerun tlt-evaluate? Note that all the name in the spec should be changed too.
    target_class_mapping {
    key: “person”
    value: “Person”
    }

More, can you share a typical image of your own data? Is the bag or face too small?

i am only training on 2 classes, person and face.

Do you mean there is no bag class in your dataset?

Yes no bag class at all
My dataset consists of close up view of people wearing masks, some full body and upper body people walking on the streets, people’s upper body
Validation cost: 0.001650
Mean average_precision (in %): nan

class name average precision (in %)


Bag nan
Face 0.028404
Person 31.2815

After re-evaluating with upper case, the results are still the same
random_seed: 42
model_config {
pretrained_model_file: “/workspace/Script/Pretrained_Weights/resnet18_peoplenet.tlt”

arch: “resnet”
num_layers: 18
load_graph: True
use_batch_norm: False
activation {
activation_type: “relu”
}
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
}

dataset_config {
data_sources: {
tfrecords_path: “/workspace/Script/TFRecords/*”
image_directory_path: “/workspace/Script/Data/”
}
image_extension: “jpg”
target_class_mapping {
key: “person”
value: “Person”
}

target_class_mapping {
key: “bag”
value: “Bag”
}

target_class_mapping {
key: “face”
value: “Face”
}
validation_fold: 0
}

training_config {
batch_size_per_gpu: 16
num_epochs: 10
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 10e-10
max_learning_rate: 10e-10
soft_start: 0.0
annealing: 0.3
}
}
regularizer {
type: L1
weight: 3e-09
}
optimizer {
adam {
epsilon: 9.9e-09
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}

bbox_rasterizer_config {
target_class_config {
key: “Person”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
target_class_config {
key: “Bag”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}

target_class_config {
key: “Face”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3

}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}

postprocessing_config{
target_class_config{
key: “Person”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.2
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “Bag”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
target_class_config{
key: “Face”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
}

evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 20
minimum_detection_ground_truth_overlap {
key: “Bag”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “Face”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “Person”
value: 0.7
}
evaluation_box_config {
key: “Bag”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “Face”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “Person”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}

cost_function_config {
target_classes {
name: “Person”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “Face”
class_weight: 2.6
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}

target_classes {
name: “Bag”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}

If your dataset only contains 2 classes: person and face, please delete all the lines about bag in the spec.
After modifying your original training spec, start to train.

Will adding some images with no person and face and creating the label txt file with no labels inside be viable for negative sampling?

It is not needed.

Thank you for your help. Will update the results again after training. Is 120 epoch generally too low for training?

It is hard to say. It depends on your images quantity, bs, etc.

Also, what benefit will removing the bag class from the spec file do? will the training be faster and will the precision also increase?

If the bag class is not included in your dataset, it is wrong to add it in the training spec.