Training of Object Detection models on Jetson Nano!

Hello everyone! I want to ask that I am going to train a object detection model specifically ssd-mobilenet on my own dataset of 600 images belonging to only one class. I am going to follow the tutorial whose link is attached below. But I want to ask that training 600 images by transfer learning on Jetson Nano is feasible? What are the necessary changes which are needed to be made on that tutorial whose link is attached?
Lastly, I want to ask that all 600 images were captured from my android phone camera of 48 MP and during inference I will be using Logitech C920 USB webcam and will be doing real time detection through live camera feed. So does it make any difference that dataset taken from some other device and inference is being made from another?
Kindly help me as I am a beginner in this field. Thanks!

Hi @farjadhaider3253, 600 images might be a bit small for a detection dataset, so if you find the accuracy isn’t very good after training the model, you might want to collect more data. For proof-of-concept, it may be just fine - you will have to test it and see.

Although it’s ideal to collect the data from a similar camera that you are using for inferencing, it should be ok. After all, publicly available datasets that are online were collected with different cameras. The source resolution shouldn’t matter all that much, because the images get downsize to 300x300 pixels before being fed into SSD-Mobilenet anyways.

Regarding the performance, I found that Nano can train ~5 images per second for SSD-Mobilenet with PyTorch. That means that for 600 images, it should hopefully take roughly ~2 minutes per epoch. So that is only 1 hour for 30 epochs or ~3.3 hours for 100 epochs. Although if the source images are ~48MP, the data loading of those large images might slow it down.

I have resized some of the images to 1080x1080 and some images to 416x416 and then labelled them in Yolo format. So is the format fine?

The pytorch-ssd training code that I use supports Pascal-VOC format and Open Images format. However I found this tool that converts from YOLO format to Pascal-VOC format:

So hopefully that works ok for you.

Does it matters that some of my images are having size of 416x416 and some are having size of 1080x1080? Because the images captured from my mobile phone were having dimensions of 4000x2256, and then I resized some to 416x416 and some to 1080x1080.

It shouldn’t matter, because the images get downscaled to 300x300 anyways before being fed into SSD-Mobilenet.

The datasets used in my examples also contain images of different sizes.

Thanks a lot @dusty_nv. I will follow your tutorial and will get back to you if got some errors.