Question about validation loss, ap and over-under fitting (Yolo v4)

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type Yolo_v4 - Resnet 18
• TLT Version (latest)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

As a first timer in training a Yolo v4 Object detector (I’ve trained Detectnet before) I’m a bit confused about the train-loss, validation-loss and (m)ap metric’s. Let me elaborate: During training the train-loss is declining, as expected. After the 10th Epoch it does the first eval. The validation-loss is lower than the training-loss. after about 50 or 60 Epochs, the validation-loss flatlines (as expected), but still is lower than train-loss. The last 20 epochs the validation-loss starts to climb, with indicates overfitting.

In my experience, one should see the validation-loss climb when the model is overfitting. Normally overfitting occurs when train-loss is lower than validation-loss. This is not the case in my situation.
Datasets are very large and nicely distributed. No strange train/val ratio. Maybe it’s the regularization or optimizers?

Which leads me to my question. When selecting an Epoch to use for inference OR retrain, should I choose the epoch with the lowest eval-loss? Or the highest (M)ap?

Yes. Please use the best one.

As in the one with the highest (m)ap?


