How to account for lopsided training datasets?

I have a training dataset that contains a lopsided distribution of annotated images. For example, I have some classes with 6000 images and others with 400 images. For example:

scissors: 6000 (42% of total)
hammer: 8000 (56% of total)
wrench: 400 ( 3% of total)

My suspicion is that such an uneven distribution of training examples may lead to suboptimal training, and I’m certainly seeing lackluster inference results, but I’m not sure if this is the issue.

Is there a way to configure the SSD and/or DetectNet models to account for an uneven distribution of training examples? There is nothing apparent regarding this in the documentation for the specification files, etc.

For detectnet_v2, you can tweak class weight in the spec file.
Some pointers are in