Can't train the SSD Mobilenet using Jetson Nano and Custom Dataset

Mpura · December 2, 2021, 10:51am

Background:
I’m trying to train the SSD Mobilenet. I’ve captured the images using my mobile phone. Did some compression to 300x300 using Roboflow and dsome augmentation on the images to generate noise, blurs, brightness variation. After that I wrote some Python script to rename the images since Roboflow gives out some very long string with dots, etc. I did the annotation in CVAT. Grouped the dataset according to the Pascal VOC format, I saw in some the posts in the forum.

Question:
I’m getting this error now. I don’t know how to debug it. I don’t know what source it’s looking for. Thanks.

2021-11-22 12:12:17 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/', dataset_type='voc', datasets=['data'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=30, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-11-22 12:12:17 - Prepare training datasets.
2021-11-22 12:12:17 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'background:0,0,0::', 'tool:128,0,0::', 'bin:0,128,0::')
2021-11-22 12:12:17 - Stored labels into file models/labels.txt.
2021-11-22 12:12:17 - Train dataset size: 55
2021-11-22 12:12:17 - Prepare Validation datasets.
2021-11-22 12:12:17 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'background:0,0,0::', 'tool:128,0,0::', 'bin:0,128,0::')
2021-11-22 12:12:17 - Validation dataset size: 10
2021-11-22 12:12:17 - Build network.
2021-11-22 12:12:18 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2021-11-22 12:12:18 - Took 0.24 seconds to load the model.
2021-11-22 12:12:18 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2021-11-22 12:12:18 - Uses CosineAnnealingLR scheduler.
2021-11-22 12:12:18 - Start training from epoch 0.
/home/mpumzi/.local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
  File "train_ssd.py", line 343, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
    for i, data in enumerate(loader):
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
NotADirectoryError: Caught NotADirectoryError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mpumzi/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 308, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "/home/mpumzi/Desktop/vision_system/object_detection/pytorch-ssd-master/vision/datasets/voc_dataset.py", line 78, in __getitem__
    boxes, labels, is_difficult = self._get_annotation(image_id)
  File "/home/mpumzi/Desktop/vision_system/object_detection/pytorch-ssd-master/vision/datasets/voc_dataset.py", line 136, in _get_annotation
    objects = ET.parse(annotation_file).findall("object")
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 586, in parse
    source = open(source, "rb")
NotADirectoryError: [Errno 20] Not a directory: 'data/Annotations/IMG_20211118_122533_jpg.xml/home/mpumzi/Desktop/vision_system/object_detection/pytorch-ssd-master/dataset/ImageSets/Main/default.txt'

dusty_nv · April 5, 2022, 5:37pm

Hi @Mpura, you need to make your own labels.txt as opposed to using the one that CVAT outputs. The labels.txt should be one class name per line, and these class names should be the same as the names that are referenced in your XML files.

OK two ideas here: can you confirm that data/Annotations/IMG_20211118_122533_jpg.xml exists?

Second, I am not sure why /home/mpumzi/Desktop/vision_system/object_detection/pytorch-ssd-master/dataset/ImageSets/Main/default.txt' is showing up here - are you sure this isn’t erroneously in your default.txt somewhere? Or maybe this is just getting printed out for some reason unrelated to the exception.

system · April 27, 2022, 6:12am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Retraining SSD Mobilenet using custom dataset annotated in Roboflow (dusty-nv/pytorch-ssd) Computer Vision & Image Processing jetson-inference , jetson-nano	4	2431	August 16, 2022
Train_ssd.py dosen't work with pascal voc dataset Jetson Nano ai-training	5	1212	February 9, 2022
Not able to train ssd-mobilenet! Jetson Nano ai-training	9	951	October 18, 2021
Problems with train_ssd.py Jetson Nano	2	1062	October 14, 2021
Train_ssd.py indices error Jetson Nano jetson-inference	12	1835	December 15, 2021
Having trouble setting up pytorch code for training ssd-mobilenet Jetson Nano jetson-inference , python , training	7	2009	October 15, 2021
Training ssd-mobilenet on jetson nano from scratch Jetson Nano ai-training	4	1705	October 18, 2021
Successful training with "train_ssd.py" using small custom data set, but error on full data set Jetson Nano ai-training	6	1914	October 18, 2021
Jetson nano start the Docker an error occurred while training your detection model ：Segmentation fault (core dumped) Jetson Nano jetson-inference	7	1334	April 21, 2022
Jetson Inference Custom Data Training Error Jetson Nano jetson-inference	14	1226	October 15, 2021

Can't train the SSD Mobilenet using Jetson Nano and Custom Dataset

Related topics