PeopleNet v1.0 unpruned model shows very bad results on COCO dataset

xzhka1229 · June 22, 2020, 4:30am

Hi, I downloaded NVIDIA peopleNet unpruned model from https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet, can install nvidia docker container v2.0_dp_py, then use DetectNet V2 to do inference on COCO Dataset person detection. For 1800 COCO dataset (resized to 960*544), with a parameter of aggregate_cov = 0.9, DBSCAN_eps = 0.3, I can only get 0.7 precison, and 0.486 recall. When I look into the bad cases, I find some horse/elephant/vegetables/… are predicted to be person and quite a lot people on the images are neglected. Are these results resonable, since peopleNet is trained based on more than 5 million person objects? Are these due to mismatch of training dataset (millions of training data are quite different with coco dataset, say no animals shown)?

Morganh · June 22, 2020, 8:01am

Could you please share your inference spec file?

For training data of peoplenet, yes, there are no animals.

xzhka1229 · June 22, 2020, 8:24am

Thanks for reply, here is my spec file. I only care about person object, so I only do a detailed setting for it.

inferencer_config{

defining target class names for the experiment.

Note: This must be mentioned in order of the networks classes.

target_classes: “person”
target_classes: “bag”
target_classes: “face”

Inference dimensions.

image_width: 960
image_height: 544

Must match what the model was trained for.

image_channels: 3
batch_size: 16
gpu_index: 0

model handler config

tlt_config{
model: “/workspace/data/resnet34_peoplenet.tlt”
}
}
bbox_handler_config{
kitti_dump: true
disable_overlay: false
overlay_linewidth: 2
classwise_bbox_handler_config{
key:“person”
value: {
confidence_model: “aggregate_cov”
output_map: “person”
confidence_threshold: 0.9
bbox_color{
R: 0
G: 255
B: 0
}
clustering_config{
coverage_threshold: 0.005
dbscan_eps: 0.3
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
classwise_bbox_handler_config{
key:“default”
value: {
confidence_model: “mean_cov”
output_map: “default”
confidence_threshold: 0.5
bbox_color{
R: 255
G: 0
B: 0
}
clustering_config{
coverage_threshold: 0.00
dbscan_eps: 0.4
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
}

Morganh · June 23, 2020, 1:33am

Hi xzhka1229,
This case is related to different dataset. The peoplenet is not trained via COCO dataset. For COCO dataset, user can use the peoplenet unpruned tlt model as a pretrained model, then trigger training.
That’s also the reason why tlt provides the unpruned peoplenet model. Users can do transfer learning against their own data.

xzhka1229 · June 23, 2020, 2:09am

Thanks for your time,

Could you please share your test data for me to try? if not, do you have a similar dataset recommended for me to use (is kitti left color images more similar to your dataset) ? I want to reproduce the good performance of peopleNet.

Thanks a lot.

Morganh · June 23, 2020, 2:14am

I run tlt-infer against parts of NV internal data only. It works fine.
Some images contain persons who have bags or luggage in open-parking ground.
You can search some via google. Sorry for the inconvenient.

Morganh · June 23, 2020, 6:22am

For example, you can download the image from https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet

$ wget https://developer.nvidia.com/sites/default/files/akamai/NGC_Images/models/peoplenet/input_11ft45deg_000070.jpg

Then run tlt-infer against it. You will get the same result as shown in the https://developer.nvidia.com/sites/default/files/akamai/NGC_Images/models/peoplenet/output_11ft45deg_000070.jpg

xzhka1229 · June 24, 2020, 4:14am

Thanks, I did download the image and tested, what I get is the same as demo, but there is only one image, so the accuracy cannot be convincing. I am at the learning stage of people detection, hence I want to compare different models on different dataset, and use peopleNet as a benchmark.

xzhka1229 · June 24, 2020, 4:48am

Sorry to bother again, can the tlt-evaluate automatically evaluate kitti dataset? since tlt-infer gives me labels but not accuracy/precision/recall… If yes, which spec file or command I should use?

Thanks

Morganh · June 24, 2020, 6:07am

To evaluate kitti dataset with tlt-evaluate, please refer to the jupyter notebook inside the docker.
To evaluate the peoplenet with tlt-evaluate, refer to People Net - - #5 by Morganh
More, one spec for reference.

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “your tfrecord”
image_directory_path: “your own image”
}
image_extension: “jpg”
target_class_mapping {
key: “person”
value: “Person”
}
target_class_mapping {
key: “Person”
value: “Person”
}
target_class_mapping {
key: “rider”
value: “Person”
}
target_class_mapping {
key: “Rider”
value: “Person”
}
target_class_mapping {
key: “personal_bag”
value: “Bag”
}
target_class_mapping {
key: “rolling_bag”
value: “Bag”
}
target_class_mapping {
key: “face”
value: “Face”
}

validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “Person”
value {
clustering_config {
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 4
}
}
}
target_class_config {
key: “Bag”
value {
clustering_config {
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 4
}
}
}
target_class_config {
key: “Face”
value {
clustering_config {
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 4
}
}
}
}
model_config {
pretrained_model_file: “/workspace/its/peoplenet/resnet34_peoplenet.tlt”
num_layers: 34
load_graph: True
use_batch_norm: False
activation {
activation_type: “relu”
}
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 1
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “Person”
value: 0.699999988079
}
minimum_detection_ground_truth_overlap {
key: “Bag”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “Face”
value: 0.5
}
evaluation_box_config {
key: “Person”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “Bag”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “Face”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “Person”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “Bag”
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “Face”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 16
num_epochs: 10
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 10e-10
max_learning_rate: 10e-10
soft_start: 0.0
annealing: 0.3
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “Person”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “Bag”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “Face”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

xzhka1229 · June 25, 2020, 4:11am

Another thing, when i use tlt-infer the labels output (kitti dump) have no scores inside. Could you please guide how to get the score of each box?

Btw, I used tlt-evaluate to do evaluation of kitti training dataset using unpruned peopleNet, I got 0% output. If i tlt-infer first and evaluate myself, I got 53% precision.

Here is my spec file:
random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/data/kitti/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/data/kitti/training”
}
image_extension: “png”
target_class_mapping {
key: “car”
value: “car”
}
target_class_mapping {
key: “cyclist”
value: “cyclist”
}
target_class_mapping {
key: “pedestrian”
value: “person”
}
target_class_mapping {
key: “person_sitting”
value: “person”
}
target_class_mapping {
key: “van”
value: “car”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “car”
value {
clustering_config {
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “cyclist”
value {
clustering_config {
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “person”
value {
clustering_config {
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
#pretrained_model_file: “/workspace/tlt-experiments/detectnet_v2/experiment_dir_pruned
#/resnet18_nopool_bn_detectnet_v2_pruned.tlt”
pretrained_model_file: “/workspace/data/resnet34_peoplenet.tlt”
num_layers: 34
use_batch_norm: false
load_graph: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 30
minimum_detection_ground_truth_overlap {
key: “car”
value: 0.699999988079
}
minimum_detection_ground_truth_overlap {
key: “cyclist”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “person”
value: 0.5
}
evaluation_box_config {
key: “car”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “cyclist”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “person”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “car”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “cyclist”
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “person”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}

Morganh · June 25, 2020, 6:04pm

For detectnet_v2, its tlt-infer result does not support showing the score of bbox yet.
If you want to run tlt-evaluate against the pruned peoplenet model, please refer to my spec and the command in People Net - - #5 by Morganh, and also please prepare some images whose labels are People or Bag or Face.
The KITTI dataset does not contain the same class. So you get the mAP=0
Please paste your command and log for “If i tlt-infer first and evaluate myself, I got 53% precision.”

Topic		Replies	Views
PeopleNet precision low for person class TAO Toolkit	22	1767	October 12, 2021
Evaluate Trained models in Tao toolkit TAO Toolkit	37	1335	July 5, 2022
TLT2.0 When using DetectNet/PeopleNet, do you need at least 2 classes..? TAO Toolkit	9	804	April 11, 2022
Run PeopleNet with tensorrt TAO Toolkit	35	9783	August 10, 2021
Training Custom Object detector with 6 classes TAO Toolkit	27	2190	October 12, 2021
Problem of tao detectnet_v2 evaluate 0% TAO Toolkit python	21	392	July 7, 2023
Details on cost_function_config for PeopleNet TAO Toolkit	2	812	October 12, 2021
Error while using Tlt-infer TAO Toolkit	6	693	October 12, 2021
Retraining peoplenet model with own images TAO Toolkit	43	1577	October 12, 2021
No detections after training PeopleNet using custom labeled data TAO Toolkit	7	867	October 12, 2021