I tried to enhance the “face” class in the PeopleNet trainable model v2.6 which can already detect persons, faces and bags due to the recall performance metric for the “face” class being noticeably low; hence, I trained the model by adding more images to the “face” class. Now, the trained model is only detecting faces but not persons, which was pretty good (high recall) prior to the training.
I retained all 3 existing classes in the specs, but the dataset I used for retraining consists of “face” class images only. Is it because of no layers are freezed (I don’t see any information in the training spec related to freezing layers); so, instead of transfer learning what I’ve done is actually a new training from the scratch or is freezing layers done automatically in TAO training? If the latter, then what might’ve caused this issue?
In a nutshell, retraining a model by adding more images to only one particular class (face) has resulted in the model detecting only that class (face) even though another class (person) had a good recall earlier.
• Hardware (T4)
• Network Type (Detectnet_v2)
• TLT Version (format_version: 2.0, toolkit_version: 3.22.05, published_date: 05/25/2022)
• Training spec file
random_seed: 42
dataset_config {
data_sources {
tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_trainval/*"
image_directory_path: "/workspace/tao-experiments/data/training"
image_extension: "png"
target_class_mapping {
key: "person"
value: "person"
target_class_mapping {
key: "bag"
value: "bag"
target_class_mapping {
key: "face"
value: "face"
validation_fold: 0
augmentation_config {
preprocessing {
output_image_width: 1248
output_image_height: 384
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
postprocessing_config {
target_class_config {
key: "person"
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
target_class_config {
key: "bag"
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
target_class_config {
key: "face"
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
model_config {
pretrained_model_file: "/workspace/tao-experiments/detectnet_v2/pretrained_peoplenet/peoplenet_vtrainable_v2.6/resnet34_peoplenet.tlt"
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
cov {
arch: "resnet"
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 30
minimum_detection_ground_truth_overlap {
key: "person"
value: 0.699999988079
minimum_detection_ground_truth_overlap {
key: "bag"
value: 0.5
minimum_detection_ground_truth_overlap {
key: "face"
value: 0.5
evaluation_box_config {
key: "person"
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
evaluation_box_config {
key: "bag"
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
evaluation_box_config {
key: "face"
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
average_precision_mode: INTEGRATE
cost_function_config {
target_classes {
name: "person"
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
target_classes {
name: "bag"
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
target_classes {
name: "face"
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
training_config {
batch_size_per_gpu: 4
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
regularizer {
type: L1
weight: 3.00000002618e-09
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
enabled: true
num_images: 3
scalar_logging_frequency: 50
infrequent_logging_frequency: 5
target_class_config {
key: "person"
value: {
coverage_threshold: 0.005
target_class_config {
key: "bag"
value: {
coverage_threshold: 0.005
target_class_config {
key: "face"
value: {
coverage_threshold: 0.005
checkpoint_interval: 10
bbox_rasterizer_config {
target_class_config {
key: "person"
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
target_class_config {
key: "bag"
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
target_class_config {
key: "face"
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
deadzone_radius: 0.400000154972
• How to reproduce the issue ?
By adding more images to face class in PeopleNet trainable model v2.6 as provided in the Jupyter notebook (DetectNet v2 sample)