Can't train the SSD Mobilenet using Jetson Nano and Custom Dataset

Background:
I’m trying to train the SSD Mobilenet. I’ve captured the images using my mobile phone. Did some compression to 300x300 using Roboflow and dsome augmentation on the images to generate noise, blurs, brightness variation. After that I wrote some Python script to rename the images since Roboflow gives out some very long string with dots, etc. I did the annotation in CVAT. Grouped the dataset according to the Pascal VOC format, I saw in some the posts in the forum.

Question:
I’m getting this error now. I don’t know how to debug it. I don’t know what source it’s looking for. Thanks.

2021-11-22 12:12:17 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/', dataset_type='voc', datasets=['data'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=30, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-11-22 12:12:17 - Prepare training datasets.
2021-11-22 12:12:17 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'background:0,0,0::', 'tool:128,0,0::', 'bin:0,128,0::')
2021-11-22 12:12:17 - Stored labels into file models/labels.txt.
2021-11-22 12:12:17 - Train dataset size: 55
2021-11-22 12:12:17 - Prepare Validation datasets.
2021-11-22 12:12:17 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'background:0,0,0::', 'tool:128,0,0::', 'bin:0,128,0::')
2021-11-22 12:12:17 - Validation dataset size: 10
2021-11-22 12:12:17 - Build network.
2021-11-22 12:12:18 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2021-11-22 12:12:18 - Took 0.24 seconds to load the model.
2021-11-22 12:12:18 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2021-11-22 12:12:18 - Uses CosineAnnealingLR scheduler.
2021-11-22 12:12:18 - Start training from epoch 0.
/home/mpumzi/.local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
  File "train_ssd.py", line 343, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
    for i, data in enumerate(loader):
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
NotADirectoryError: Caught NotADirectoryError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 308, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "/home/mpumzi/Desktop/vision_system/object_detection/pytorch-ssd-master/vision/datasets/voc_dataset.py", line 78, in __getitem__
    boxes, labels, is_difficult = self._get_annotation(image_id)
  File "/home/mpumzi/Desktop/vision_system/object_detection/pytorch-ssd-master/vision/datasets/voc_dataset.py", line 136, in _get_annotation
    objects = ET.parse(annotation_file).findall("object")
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 586, in parse
    source = open(source, "rb")
NotADirectoryError: [Errno 20] Not a directory: 'data/Annotations/IMG_20211118_122533_jpg.xml/home/mpumzi/Desktop/vision_system/object_detection/pytorch-ssd-master/dataset/ImageSets/Main/default.txt'

Hi @Mpura, you need to make your own labels.txt as opposed to using the one that CVAT outputs. The labels.txt should be one class name per line, and these class names should be the same as the names that are referenced in your XML files.

OK two ideas here: can you confirm that data/Annotations/IMG_20211118_122533_jpg.xml exists?

Second, I am not sure why /home/mpumzi/Desktop/vision_system/object_detection/pytorch-ssd-master/dataset/ImageSets/Main/default.txt' is showing up here - are you sure this isn’t erroneously in your default.txt somewhere? Or maybe this is just getting printed out for some reason unrelated to the exception.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.