Delete models from jetson-inference/python/training/detection/ssd

Hi, I have a bunch of bad/failed models after train-ssd.py. The folders are locked but take up a big chunck of my memory-card and i want to remove them before trying to make new ones.

I am trying to create a good model for detecting boats, any tips?

I have tried both 25000 pictures, 10000, and 2500. And mostly the training is killed of in the process.
I have tried adjusting batch-size and workers, but it keeps happening.
I have also tried the memory-swap but I am unsure if it is correct.
I am on a Jetson Orin Nano DK 8GB.

Hi,

Since Orin Nano is relatively memory-limited, is it possible to apply the training on a desktop environment and copy the model to Jetson for deployment?
If not, please try to reduce the batch size for training.

Thanks.

Have tried workers=0 and batch-size=1 and 2, but same thing.

How many pictures is recommended for a model designed for recognizing boats. I download the pics with open-picture-downloader script.

And for my first question, is it possible to delete the models that turned out bad?

Would be open to train on desktop as well, but having a hard time getting Jetson-Inference docker to work on it.

Hi,

Yes, the model is a file so you can remove it directly.

Please also check the memory status when running the training.
If the memory usage is close to full, it’s recommended to train it on other platforms first.

$ sudo tegrastats

Thanks.

Hi @viccivic98, regarding the ‘killed’ errors (out of memory), if you haven’t already see these steps to mount SWAP, disable ZRAM, and disable the desktop UI: https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md#mounting-swap

The number of images in the dataset you need depends on how accurate you want the model to be trained for. I would recommend starting with a small test set first of ~500-1000 images just to make sure you have the process working, before leaving it to train on the larger dataset for a while over more epochs.

Also, you can directly clone the pytorch-ssd submodule from jetson-inference to your x86 PC, and run the training there you have CUDA/PyTorch/ect installed on your PC. Or I use NGC pytorch container on x86 and that works good too.

We are creating a script for detecting boats on a dock. I am now trying a training on my desktop with 26000 pictures collected from open_images_downloader.py.

I have it set on 100 epochs, batch-size 4 and workers 2, and the training is still going.

If the model is not good as a result, would it be better with less images, even more epochs?

PS, i tried the memory swap thing but not sure if it was done correct, but i moved the training work over to the desktop now anyways,

That is a good call to have moved your training over to your desktop for datasets that size. The training tutorials I make for Jetson are mostly for educational purposes so people can learn how to get started fine-tuning their own models before transitioning to bigger systems for that (although on Jetson AGX it is decently fast for transfer learning on CNNs)

Optimizing the model accuracy based on dataset size, curation, and training hyperparameters can be model-specific and require experimentation on your part. Which now that you have it training on a faster system, you can more easily train multiple models and iterate to attain your desired performance.

Can 26000 pictures be to big of a dataset for only one object? That 10k would maybe be better for boat-detection?

@viccivic98 with that many pictures of just one object, by instinct is that the dataset will be unbalanced and the model biased towards that object. Hence in the training scripts and dataset downloaders there are options for balancing the distribution of class labels.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.