CVAT dataset split for train_ssd.py

I have been using CVAT to annotate data for train_ssd.py. I was wondering if I should worry about the train/test/val data sets. I don’t see a way differentiate or split them in CVAT so I assume I am getting one data set for all three.

Should I attempt to split the dataset with torch.utils.data.random_split ?

Hi @gdefender, if you are going for a production-quality model, then yes it may be prudent to have independent train/val/test sets for training. Typical use-cases for train_ssd.py are for testing & development (and educational learning) so it’s not a huge deal just to re-use the training set for those. Folks may find the TAO Toolkit good for training production-quality models.

Anyways, in Pascal VOC-style datasets all of the data is intermixed, and the train/val/test splits are dictated by the text files under ImageSets/. So you could make a little Python script that randomly generated these text files from the master ‘train.txt’. Or as you suggested, you could attempt to modify the train_ssd.py source to use torch.utils.data.random_split. I would probably do the first way to remain consistent with how Pascal VOC datasets are and to not have to do further debugging.

Hope that helps!

Thanks for the fast response. I am an adult mentor working with a FIRST Robotics Competition high-school team. We are attempting to use jetson-inference to identify red and blue balls for our robot to pick up with a Jetson Nano. Our most powerful nvidia hardware is a Jeton TX2, so TAO is out for training at this time.
I am hopeful we will be able to get a reliable enough model with train_ssd.py for our application. We are having mixed success, but we’re making progress!

OK gotcha - in your situation, to be honest I would just use all of the training data that you have in the train set and not worry about the splits. This will let you use all of your custom-annotated data for training the model. CVAT will output a default.txt for the ImageSet and train_ssd.py knows to use this file for train/val/test.

Wish you and your team the best of luck this season in FIRST!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.