one thought in my mind: Souldn’t I add the class on the peoplenet? Just I should’ve used same classes of peoplenet and it would be better to matching classes in widerperson to the peoplenet’s classes. Does this sound right to you?
Yes, need to set this if your images are not of the same resolution.
See DetectNet_v2 - NVIDIA Docs
The train tool does not support training on images of multiple resolutions. However, the dataloader does support resizing images to the input resolution defined in the specification file. This can be enabled by setting the enable_auto_resize parameter to true in the augmentation_config module of the spec file.
Thanks. I tried with enable_auto_resize: True, but it still shows low accuracy after 300 epochs. experiment_spec.txt (8.9 KB)
Validation cost: 0.000490
Mean average_precision (in %): 17.0623
class name average precision (in %)
------------------------- --------------------------
crowd 0.143917
ignore-regions 0
partially-visible-persons 11.4278
pedestrian 73.7398
riders 0
Other reasons I’m suspecting are that,
the class is highly imbalanced. I set class weight as 1 for all classes. But in my experience, this could not bring dramatic changes.
Is pretrained peoplenet weak at new class? peoplenet is good at person related classes, (pedestrian, partially visible person), but not good at other classes. Shouldn’t I use people net for those non-people-related classes?
Distribute the dataset class: How do I balance the weight between classes if the dataset has significantly higher samples for one class versus another?
To account for imbalance, increase the class_weight for classes with fewer samples. You can also try disabling enable_autoweighting; in this case initial_weight is used to control cov/regression weighting. It is important to keep the number of samples of different classes balanced, which helps improve mAP.
For the 5 classes you have trained, which classes are not person related? The detectnet_v2 can train people or non-people-related classes. But since the peoplenet pretrained model is used, it is more easier to get people classes converged.
I inspected images, all classes are related to person, but ignore-regions and crowd class are extremely hard class, so accuracy on these classes are not expected.
objects are large enough. Thanks for your consideration.
One last question: I’m not sure it is public information, but I’m curious when the peoplenet is trained, was WiderPerson used for training?