Transfer learning (train_ssd.py) with combined dataset: "Train dataset size: 0"?

willemvdkletersteeg · June 4, 2021, 7:19am

We have built an application for our Xavier NX to do live object recognition using the SSD-Mobilenet-V2 model. In our application, we need to recognize persons, cars, trucks and busses. All other object recognitions are ignored by our application.

But now we would like to add a class that does not exist (yet) in de COCO classes that are passed by the detectnet in the jetson-inference python library. So, we started collecting our own dataset for this new class of objects and started training the ssd-mobilenet with the excellent scripts and tools in the jetson-inference library. We have made a VOC dataset with CVAT which train_ssd.py accepts happily.

This works! Our application can now perfectly detect the new objects. But, it does NOT detect the other ones now. It only detects our newly learnt class. I’ve researched a bit on this topic and apparently that’s by design. One has to learn all classes/objects in one go?

So, I thought, I’d just use the image downloader for the Open Images dataset to download a bunch of persons, cars, busses and trucks. I found a tool to convert an OI dataset into a VOC dataset (or at least the XML’s needed) so in theory I would be able to merge this with our own dataset to start the learning. This tool is called oidv6-to-voc and works fine. Or at least it seems to.

I have added the images to the JPEGImages older in the VOC folderstructure (that worked previously) and the annotations to the Annotations folder. I have added the filenames (without extension) to the appropriate ImageSet file (either train.txt, val.txt or test.txt) and the appropriate labels to labels.txt.

But then when I start train_ssd.py I get this:

$ python3 train_ssd.py --data=data/OID_and_deepdoors2/voc --model-dir=models/doors_v2 --batch-size=4 --dataset-type=voc
2021-06-04 09:10:03 - Using CUDA...
2021-06-04 09:10:03 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/doors_v2', dataset_type='voc', datasets=['data/OID_and_deepdoors2/voc'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=30, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-06-04 09:10:03 - Prepare training datasets.
2021-06-04 09:10:03 - VOC Labels read from file: ('BACKGROUND', 'open door', 'Car', 'Truck', 'Bus', 'Person', 'Door')
2021-06-04 09:10:03 - Stored labels into file models/doors_v2/labels.txt.
2021-06-04 09:10:03 - Train dataset size: 0
Traceback (most recent call last):
  File "train_ssd.py", line 237, in <module>
    shuffle=True)
  File "/home/willem/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 224, in __init__
    sampler = RandomSampler(dataset, generator=generator)
  File "/home/willem/.local/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 96, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Apparantly it is unable to read the train set from train.txt but I’m not sure why?

willemvdkletersteeg · June 4, 2021, 7:39am

I decided to run the script in my debugger and it appears as though it’s reading the training set from trainval.txt instead of train.txt? My trainval.txt was empty because I assumed it had to be so. My bad.

I added the entire content of both train.txt and val.txt to trainval.txt and now it seems to work!

AastaLLL · June 4, 2021, 9:09am

Good to know this!

Thanks for the update.