Training acc is too low than expected: Peoplenet on custom dataset

Hello, I’m trying to train PeopleNet on custom dataset (widerperson).

The original annotation of widerperson is in yolo format, so I firstly changed it to kitti format.

Then, I converted dataset into tfrecords with below spec files.
peoplenet_tfrecords_wp_train.txt (302 Bytes)
peoplenet_tfrecords_wp_val.txt (298 Bytes)

I trained with below spec file.
experiment_spec.txt (8.8 KB)

But somehow, the result looks very bad. Moreover, Class 2 is passed in the ‘Matching predictions to ground truth’ process.

Could you please let me know if I missed in the process?

Thanks in advance.

one thought in my mind: Souldn’t I add the class on the peoplenet? Just I should’ve used same classes of peoplenet and it would be better to matching classes in widerperson to the peoplenet’s classes. Does this sound right to you?

How about the resolution of all the training images? Are they all 640x640?

1 Like

I don’t think so. Does auto_size_enable: True help?

One more question related to augmentation config,

In yolo_v4, should image size of training images and output_width and output_height be same? link

Yes, need to set this if your images are not of the same resolution.
See DetectNet_v2 - NVIDIA Docs
The train tool does not support training on images of multiple resolutions. However, the dataloader does support resizing images to the input resolution defined in the specification file. This can be enabled by setting the enable_auto_resize parameter to true in the augmentation_config module of the spec file.

No.
For yolov4, see YOLOv4 - NVIDIA Docs

  • Input size: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)
1 Like

Thanks. I tried with enable_auto_resize: True, but it still shows low accuracy after 300 epochs.
experiment_spec.txt (8.9 KB)

Validation cost: 0.000490
Mean average_precision (in %): 17.0623

class name                   average precision (in %)
-------------------------  --------------------------
crowd                                        0.143917
ignore-regions                               0
partially-visible-persons                   11.4278
pedestrian                                  73.7398
riders                                       0

Other reasons I’m suspecting are that,

  1. the class is highly imbalanced. I set class weight as 1 for all classes. But in my experience, this could not bring dramatic changes.
  2. Is pretrained peoplenet weak at new class? peoplenet is good at person related classes, (pedestrian, partially visible person), but not good at other classes. Shouldn’t I use people net for those non-people-related classes?

I’m curious to hear your opinions.

I see. thanks for the comment.

For imbalanced dataset, refer to Frequently Asked Questions - NVIDIA Docs

Distribute the dataset class: How do I balance the weight between classes if the dataset has significantly higher samples for one class versus another?

To account for imbalance, increase the class_weight for classes with fewer samples. You can also try disabling enable_autoweighting; in this case initial_weight is used to control cov/regression weighting. It is important to keep the number of samples of different classes balanced, which helps improve mAP.

For the 5 classes you have trained, which classes are not person related? The detectnet_v2 can train people or non-people-related classes. But since the peoplenet pretrained model is used, it is more easier to get people classes converged.

More, are the non-people-class objects small? If yes, please refer to Frequently Asked Questions - NVIDIA Docs to do some parameter changing.

In DetectNet_V2, are there any parameters that can help improve AP (average precision) on training small objects?

Following parameters can help you improve AP on smaller objects:

  • Increase num_layers of resnet
  • class_weight for small objects
  • Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class
  • Decrease minimum_detection_ground_truth_overlap
  • Lower minimum_height to cover more small objects for evaluation.
1 Like

Thanks for the comment.

I inspected images, all classes are related to person, but ignore-regions and crowd class are extremely hard class, so accuracy on these classes are not expected.

objects are large enough. Thanks for your consideration.

One last question: I’m not sure it is public information, but I’m curious when the peoplenet is trained, was WiderPerson used for training?

No, it is trained with a proprietary dataset.
More info can be found in
PeopleNet - NVIDIA Docs and
PeopleNet | NVIDIA NGC

1 Like

Does this mean that I should resize whole training datsaet into fixed size, then train the model?

and I guess anchor sizes should also be obtained after resizing whole dataset.

Is that right?

No, it is not needed.

Please run kmeans to get the anchor shapes. Not needed to resize the whole dataset.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.