I have been using the Nvidia’s tlt:
DetectNet with googlnet
I am training for 5 classes and following are the instances of each classes in 10,770 train images:
- person [instance: 18,004] precision after training: 80.1
- gun [instance: 5056] precision after training: 52.92
- mask [instance: 4121] precision after training: 74.28
- face [instance: 9053] precision after training: 86.22
- helmet [instance: 2164] precision after training: 89.308
- knife [instance: 1206] precision after training: 48.36
Mean average_precision: 71.872%
But I could only get an overall precision of 68% initially, after which I tried to improve the annotations on images and could only get a mean average precision of 71.87% over all.
How do I resolve this issue. Please provide in-depth suggestions.