I follow the video below to train the cat_dog model.
Jetson AI Fundamentals - S3E2 - Image Classification Inference
Although I successfully trained the model (model_best.pth.tar is saved and new timestamp is updated ), the accuracy of identifying dog pictures is lower than 20% (by using the photos in “test” folder), but it’s almost 90% for cat photos. I train the model for 70 epochs and got a Acc@1 79% accuracy.
I removed the .engine file and train the model (1 epochs) as well as convert the new ONNX again.
However, this time I successfully identified 90 dog photos out of 100 but only 10 cat photos out of 100.
I can see the new .engine file regenerated while executing “imagenet” command. Although I cannot have the same result as you show in the video, but overall speaking, the accuracy is around 50%.
I download the model and the ONNX from the link that you provided for 100 epochs training and run the test again, this time the overall accuracy is around 80% and it’s for both cat and dog. I think I need to train the model with more epochs to get more accurate result.
But just be curious, if I trained the model 100 epochs and finally got the final model_best.pth.tar is saved for around 80% accuracy, then I trained the model again for 1 epoch, does the result of 1 epoch training overwrite the result of 100 epochs?
Yes, unless you ran train.py with the --resume=<CHECKPOINT> flag, the result of 1 epoch would overwrite the model_best.pth.tar of your previous 100 epoch run. That is because by default, the training starts fresh from the first epoch (unless you are resuming the training with the --resume and --start-epoch flags).
It’s probably a good idea to make a backup copy of your model_best.pth.tar after a long training run to prevent inadvertent overwriting.