I’m trying to retrain peoplenet models (tlt_peoplenet_unpruned_v2.0) using my own images.
Training is fine with final accuracy
Validation cost: 0.000122
Mean average_precision (in %): 99.9602
class name average precision (in %)
------------ --------------------------
person 99.9602
Is the training image much different with the test image?
Suggest you trigger below experiments.
Firstly, trigger a new training only against part(for example, 80%) of your test dataset, set “validation_fold: 0” in order to validate the other 20% of your test dataset. To check the mAP result.
If (1) still have a good mAP, suggest you adding the tfrecords into your original training.
Spec:
dataset_config {
data_sources {
tfrecords_path: “/workspace/tlt-experiments/ObjectDetectionData/pedestrians/hor_0-50_ver_0-75_overlapping/tfrecords/*”
image_directory_path: “/workspace/tlt-experiments/ObjectDetectionData/pedestrians/hor_0-50_ver_0-75_overlapping/train/”
}
data_sources {
tfrecords_path: <the tfrecords respond to 80% of your test dataset>
image_directory_path: <80% of your test dataset>
}
image_extension: “jpeg”
target_class_mapping {
key: “person”
value: “person”
} #validation_fold: 0
validation_data_source: {
tfrecords_path: <the tfrecords respond to 20% of your test dataset>
image_directory_path: <20% of your test dataset>
}
}
So your suggestion is to train together with 80% of test data set. Just use 20% of test data set is used for training. My test data set has no label yet.
My training set is similar to test set. What i did was
(1)crop humnbody individually.
(2)then rearrange on a different background image with different overlapping % horizontally and vertically. That makes images with different crowd size.
Then do training. Is that make sense?
I like to detect really crowd images. So that images with different crowd sizes are augmented in that way. Is that make sense?
Share one experiment with you.
For your above test image, I just run the ngc resnet34 unpruned tlt model directly.
And it can get expected result. So the pretrained model is fine.