Jetson AI Fundamentals - S3E2 - Image Classification Inference - low accuracy after 70 epochs by using imagenet

I follow the video below to train the cat_dog model.

Jetson AI Fundamentals - S3E2 - Image Classification Inference

Although I successfully trained the model (model_best.pth.tar is saved and new timestamp is updated ), the accuracy of identifying dog pictures is lower than 20% (by using the photos in “test” folder), but it’s almost 90% for cat photos. I train the model for 70 epochs and got a Acc@1 79% accuracy.

I get an “Segmentation fault (core dumped)” after the model is trained but according to the this threat ( Segmentation fault (core dumped) on jetson nano when training resnet-18 on my small dataset of just 60 images using transfer learning!), it’s fine because the “segfault” happens after done training.

Would you please tell me what’s to do next?
p.s. I also convert model to ONNX successfully with new timestamp updated (Segmentation fault also occurred while converting)

Hi,

May I know your next step is to improve accuracy or run the inference on Nano?

Thanks.

Hi
I’d like to reproduce the result from the S3E2 video, which is have accuracy more than 70% for both dog and cat classes via using the photos in the test folder.

the accuracy that I’m referring to about here is to identify 70 dogs’ photo correctly out of 100 god’s photo, however, I only have 20 at this moment.

Hi @lzlallen1980, can you try deleting the .engine file in your model’s folder? TensorRT will then re-generate the engine from your ONNX the next time your run imagenet program.

It’s possible that you are using an old engine if you had exported the ONNX previously, and it needs re-generated against your latest ONNX model.

Hi @dusty_nv
I removed the .engine file and train the model (1 epochs) as well as convert the new ONNX again.
However, this time I successfully identified 90 dog photos out of 100 but only 10 cat photos out of 100.

I can see the new .engine file regenerated while executing “imagenet” command. Although I cannot have the same result as you show in the video, but overall speaking, the accuracy is around 50%.

I download the model and the ONNX from the link that you provided for 100 epochs training and run the test again, this time the overall accuracy is around 80% and it’s for both cat and dog. I think I need to train the model with more epochs to get more accurate result.

But just be curious, if I trained the model 100 epochs and finally got the final model_best.pth.tar is saved for around 80% accuracy, then I trained the model again for 1 epoch, does the result of 1 epoch training overwrite the result of 100 epochs?

Yes, unless you ran train.py with the --resume=<CHECKPOINT> flag, the result of 1 epoch would overwrite the model_best.pth.tar of your previous 100 epoch run. That is because by default, the training starts fresh from the first epoch (unless you are resuming the training with the --resume and --start-epoch flags).

It’s probably a good idea to make a backup copy of your model_best.pth.tar after a long training run to prevent inadvertent overwriting.

1 Like