I follow the video below to train the cat_dog model.
Jetson AI Fundamentals - S3E2 - Image Classification Inference
Although I successfully trained the model (model_best.pth.tar is saved and new timestamp is updated ), the accuracy of identifying dog pictures is lower than 20% (by using the photos in “test” folder), but it’s almost 90% for cat photos. I train the model for 70 epochs and got a Acc@1 79% accuracy.
Would you please tell me what’s to do next?
p.s. I also convert model to ONNX successfully with new timestamp updated (Segmentation fault also occurred while converting)
Hi
I’d like to reproduce the result from the S3E2 video, which is have accuracy more than 70% for both dog and cat classes via using the photos in the test folder.
the accuracy that I’m referring to about here is to identify 70 dogs’ photo correctly out of 100 god’s photo, however, I only have 20 at this moment.
Hi @ryan_sg, can you try deleting the .engine file in your model’s folder? TensorRT will then re-generate the engine from your ONNX the next time your run imagenet program.
It’s possible that you are using an old engine if you had exported the ONNX previously, and it needs re-generated against your latest ONNX model.
Hi @dusty_nv
I removed the .engine file and train the model (1 epochs) as well as convert the new ONNX again.
However, this time I successfully identified 90 dog photos out of 100 but only 10 cat photos out of 100.
I can see the new .engine file regenerated while executing “imagenet” command. Although I cannot have the same result as you show in the video, but overall speaking, the accuracy is around 50%.
I download the model and the ONNX from the link that you provided for 100 epochs training and run the test again, this time the overall accuracy is around 80% and it’s for both cat and dog. I think I need to train the model with more epochs to get more accurate result.
But just be curious, if I trained the model 100 epochs and finally got the final model_best.pth.tar is saved for around 80% accuracy, then I trained the model again for 1 epoch, does the result of 1 epoch training overwrite the result of 100 epochs?
Yes, unless you ran train.py with the --resume=<CHECKPOINT> flag, the result of 1 epoch would overwrite the model_best.pth.tar of your previous 100 epoch run. That is because by default, the training starts fresh from the first epoch (unless you are resuming the training with the --resume and --start-epoch flags).
It’s probably a good idea to make a backup copy of your model_best.pth.tar after a long training run to prevent inadvertent overwriting.