Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type Yolo_v4 - Resnet 18
• TLT Version (latest)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
As a first timer in training a Yolo v4 Object detector (I’ve trained Detectnet before) I’m a bit confused about the train-loss, validation-loss and (m)ap metric’s. Let me elaborate: During training the train-loss is declining, as expected. After the 10th Epoch it does the first eval. The validation-loss is lower than the training-loss. after about 50 or 60 Epochs, the validation-loss flatlines (as expected), but still is lower than train-loss. The last 20 epochs the validation-loss starts to climb, with indicates overfitting.
In my experience, one should see the validation-loss climb when the model is overfitting. Normally overfitting occurs when train-loss is lower than validation-loss. This is not the case in my situation.
Datasets are very large and nicely distributed. No strange train/val ratio. Maybe it’s the regularization or optimizers?
Which leads me to my question. When selecting an Epoch to use for inference OR retrain, should I choose the epoch with the lowest eval-loss? Or the highest (M)ap?