Retraining peoplenet model with own images

I’m trying to retrain peoplenet models (tlt_peoplenet_unpruned_v2.0) using my own images.
Training is fine with final accuracy

Validation cost: 0.000122
Mean average_precision (in %): 99.9602

class name      average precision (in %)
------------  --------------------------
person                           99.9602

Trained 15 epochs and I have 34000 images in training set.
The following train config file was used. detectnet_v2_train_resnet34_kitti.txt (3.1 KB)

Training command is
tlt-train detectnet_v2 -e /workspace/tlt-experiments/ObjectDetectionData/pedestrians/detectnet_v2_train_resnet34_kitti.txt -r /workspace/tlt-experiments/ObjectDetectionData/pedestrians/hor_0-50_ver_0-75_overlapping/trained_models_34resnet -k tlt_encode -n peoplenet_resnet34

Final training accuracy is good. But testing accuracy is almost no detection. Is it overfitting issue?

How about the tlt-infer result?

Yes same results using tlt-infer. My label file is as follow.

person 0.00 0 0.00 576 10 669 510 0.00 0.00 0.00 0.00 0.00 0.00 0.00
person 0.00 0 0.00 674 10 778 414 0.00 0.00 0.00 0.00 0.00 0.00 0.00
person 0.00 0 0.00 783 10 871 431 0.00 0.00 0.00 0.00 0.00 0.00 0.00
person 0.00 0 0.00 876 10 971 408 0.00 0.00 0.00 0.00 0.00 0.00 0.00
person 0.00 0 0.00 976 10 1064 412 0.00 0.00 0.00 0.00 0.00 0.00 0.00

what could be wrong?

Could you please narrow down via below experiment?
Run tlt-infer against all of your training images or the validation images.

Does labelling person or Person matter? Now I changed to Person and retraining.

I’ll test what you mentioned after that.

In the training spec, please set to lowercase “person” because a label “Person” will be written to “person” during tfrecord generation.

Thanks I am in the middle of retraining. so i’ll test with training and validating data after training.

Yes detection accuracy is very good testing with trained images.But no detection with test images. So is it overfitting? How can I overcome this?

Is the training image much different with the test image?
Suggest you trigger below experiments.

  1. Firstly, trigger a new training only against part(for example, 80%) of your test dataset, set “validation_fold: 0” in order to validate the other 20% of your test dataset. To check the mAP result.
  2. If (1) still have a good mAP, suggest you adding the tfrecords into your original training.
    Spec:
    dataset_config {
    data_sources {
    tfrecords_path: “/workspace/tlt-experiments/ObjectDetectionData/pedestrians/hor_0-50_ver_0-75_overlapping/tfrecords/*”
    image_directory_path: “/workspace/tlt-experiments/ObjectDetectionData/pedestrians/hor_0-50_ver_0-75_overlapping/train/”
    }
    data_sources {
    tfrecords_path: <the tfrecords respond to 80% of your test dataset>
    image_directory_path: <80% of your test dataset>
    }
    image_extension: “jpeg”
    target_class_mapping {
    key: “person”
    value: “person”
    }
    #validation_fold: 0
    validation_data_source: {
    tfrecords_path: <the tfrecords respond to 20% of your test dataset>
    image_directory_path: <20% of your test dataset>
    }
    }

So your suggestion is to train together with 80% of test data set. Just use 20% of test data set is used for training. My test data set has no label yet.

My training set is similar to test set. What i did was
(1)crop humnbody individually.
(2)then rearrange on a different background image with different overlapping % horizontally and vertically. That makes images with different crowd size.

Then do training. Is that make sense?
I like to detect really crowd images. So that images with different crowd sizes are augmented in that way. Is that make sense?

Your attached image is a test image. Could you attach a training image too?

No that is training image. Test image is as follow.

Share one experiment with you.
For your above test image, I just run the ngc resnet34 unpruned tlt model directly.
And it can get expected result. So the pretrained model is fine.

Step:

  1. Trigger tlt 2.0_py3 docker
  2. Download tlt pretrained model
    $ wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplenet/versions/unpruned_v2.0/files/resnet34_peoplenet.tlt
  3. Download your test image
    $ wget https://aws1.discourse-cdn.com/nvidia/original/3X/9/9/9937f27d49d4adf2432969222cb12fc004b5f75e.jpeg
  4. Create inference spec
    $ vim detectnet_v2_inference_kitti_tlt.txt
inferencer_config{

target_classes: “Person”
target_classes: “Bag”
target_classes: “Face”

image_width: 960
image_height: 544

image_channels: 3
batch_size: 16
gpu_index: 0

tlt_config{
model: “/workspace/resnet34_peoplenet.tlt”
}
}
bbox_handler_config{
kitti_dump: true
disable_overlay: false
overlay_linewidth: 2
classwise_bbox_handler_config{
key:“Person”
value: {
confidence_model: “aggregate_cov”
output_map: “Person”
confidence_threshold: 0.9
bbox_color{
R: 0
G: 255
B: 0
}
clustering_config{
coverage_threshold: 0.00
dbscan_eps: 0.3
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
classwise_bbox_handler_config{
key:“Bag”
value: {
confidence_model: “aggregate_cov”
output_map: “Bag”
confidence_threshold: 0.9
bbox_color{
R: 0
G: 255
B: 255
}
clustering_config{
coverage_threshold: 0.00
dbscan_eps: 0.3
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
classwise_bbox_handler_config{
key:“Face”
value: {
confidence_model: “aggregate_cov”
output_map: “Face”
confidence_threshold: 0.9
bbox_color{
R: 255
G: 0
B: 0
}
clustering_config{
coverage_threshold: 0.00
dbscan_eps: 0.3
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
classwise_bbox_handler_config{
key:“default”
value: {
confidence_model: “aggregate_cov”
confidence_threshold: 0.9
bbox_color{
R: 255
G: 0
B: 0
}
clustering_config{
coverage_threshold: 0.00
dbscan_eps: 0.3
dbscan_min_samples: 0.05
minimum_bounding_box_height: 4
}
}
}
}

  1. Run inference
    $ tlt-infer detectnet_v2 -e detectnet_v2_inference_kitti_tlt.txt -o output -i 9937f27d49d4adf2432969222cb12fc004b5f75e.jpeg -k tlt_encode

  2. Get below result

How did you run tlt-infer? Actually there are two ways to run tlt-infer.

  1. with tlt file
  2. with trt engine

If you were using trt engine previously, you can try to use your trained tlt file instead of trt engine to run tlt-infer again.

I use tlt file.

Yes it works using pretrained model from Nvidia. I tried to retrain to see better improvements. After retraining, no detection.

Can you run tlt-infer against your unpruned tlt file?

Yes tlt-infer can run, but no detection

May I confirm below?

  1. For your test dataset, is there not any detection or are there few detections?
  2. For your training dataset and validation dataset, are there expected detections?

Can you attach both tlt-infer spec files too?

Test dataset has no detection at all.
Training and validation has expected detections.

I changed epochs from 15, 10 to 5. Results are not much difference with poor or no detection at all.

Training specs and infer specs are as follows.
detectnet_v2_inference_kitti_tlt.txt (1.0 KB) detectnet_v2_train_resnet34_kitti.txt (3.1 KB)