I am struggling with training a new object detection model.
What I want to do is detecting mouth (and eyes if possible).
I captured 100 ~ 200 images using the builtin camera on Jetson TX2.
(All training images are very similar…)
And I followed the instruction (https://github.com/dusty-nv/jetson-inference) to train the model.
On DIGITS, Loss seemed to decrease well, however, it did not work well on validation data which is very similar with training data.
What should I do to train a model? Specifically,
- Roughly how many images do I need?
- Should I change the net itself besides DIGITS parameters? (E.g., strides?)
- Should I consider sizes of bounding boxes?
- Any useful advice for training with small dataset?