I have a training dataset that contains a lopsided distribution of annotated images. For example, I have some classes with 6000 images and others with 400 images. For example:
scissors: 6000 (42% of total)
hammer: 8000 (56% of total)
wrench: 400 ( 3% of total)
My suspicion is that such an uneven distribution of training examples may lead to suboptimal training, and I’m certainly seeing lackluster inference results, but I’m not sure if this is the issue.
Is there a way to configure the SSD and/or DetectNet models to account for an uneven distribution of training examples? There is nothing apparent regarding this in the documentation for the specification files, etc.