Train_ssd.py indices error

I’m trying to train a model based off of images i’ve labeled. I had it working before, and added a few hundred more images to it and now i’m getting an error when running the command. Here is the command.

python3 train_ssd.py --dataset-type=voc --data=/mnt/cifs/NAS/ImageTraining/TrainingData/ --model-dir=models/Home --batch-size=4 --workers=2 --epochs=30

And the error

Traceback (most recent call last):
  File "train_ssd.py", line 344, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
    for i, data in enumerate(loader):
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1179, in _next_data
    return self._process_data(data)
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/home/keith/.local/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 219, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "/home/keith/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 81, in __getitem__
    image, boxes, labels = self.transform(image, boxes, labels)
  File "/home/keith/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in __call__
    return self.augment(img, boxes, labels)
  File "/home/keith/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in __call__
    img, boxes, labels = t(img, boxes, labels)
  File "/home/keith/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 357, in __call__
    boxes[:, 0::2] = width - boxes[:, 2::-2]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

I’ve seen this error reported in various places but nothing seems to fix the problem for me. Not really sure where to start with this. Any suggestions?

I’ve tried narrowing it down to a specific xml file, which i believe i have, but nothing stands out within it. Looks just like the other files.

Hi,

Which JetPack version do you use?
If it is not the latest v4.6, please remember to checkout the corresponding branch.

Ex. on JetPack 4.5

$ git clone -b L4T-R32.5.0 https://github.com/dusty-nv/jetson-inference.git

Thanks.

I have the same problem when I training my own Pascal VOC dataset. I don’t know how to fix it because I’m a rookie at coding, machine learning. I’m just following the example in the Hello AI world to implement object detection with Jetson nano. I have to create my own dataset by using CVAT and export it to Pascal VOC 1.1 format, and I have added the label.txt in the folder. So could please tell me how to fix it. Thank you very much.

python3 train_ssd.py --dataset-type=voc --data=data/dog-leash-resize --model-dir=models/dog-leash-resize --batch-size=4 --workers=2 --epochs=10
2021-11-15 11:01:46 - Using CUDA...
2021-11-15 11:01:46 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/dog-leash-resize', dataset_type='voc', datasets=['data/dog-leash-resize'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=10, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-11-15 11:01:46 - Prepare training datasets.
2021-11-15 11:01:46 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'background:0,0,0::', 'dog leash:128,0,0::')
2021-11-15 11:01:46 - Stored labels into file models/dog-leash-resize/labels.txt.
2021-11-15 11:01:46 - Train dataset size: 187
2021-11-15 11:01:46 - Prepare Validation datasets.
2021-11-15 11:01:46 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'background:0,0,0::', 'dog leash:128,0,0::')
2021-11-15 11:01:46 - Validation dataset size: 187
2021-11-15 11:01:46 - Build network.
2021-11-15 11:01:47 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2021-11-15 11:01:47 - Took 0.52 seconds to load the model.
2021-11-15 11:02:01 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2021-11-15 11:02:01 - Uses CosineAnnealingLR scheduler.
2021-11-15 11:02:01 - Start training from epoch 0.
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
warning - image 161 has object with unknown class 'dog leash'
warning - image 60 has object with unknown class 'dog leash'
warning - image 141 has object with unknown class 'dog leash'
warning - image 76 has object with unknown class 'dog leash'
Traceback (most recent call last):
  File "train_ssd.py", line 343, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
    for i, data in enumerate(loader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 425, in reraise
warning - image 12 has object with unknown class 'dog leash'
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataset.py", line 257, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 81, in __getitem__
    image, boxes, labels = self.transform(image, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in __call__
    return self.augment(img, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in __call__
    img, boxes, labels = t(img, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 345, in __call__
    boxes[:, :2] += (int(left), int(top))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

I’m not sure how to check the version i’m on. Git branch shows i’m on master, and i did a git pull on it before posting to ensure i’m up to date.

Hi @user48945, part of the issue is that you are using the labelmap from CVAT, and not a labels.txt you created like this:

dog leash

Your labels.txt should have one class name per line.

The way to debug this is to run train_ssd.py with --debug-steps=1 --batch-size=1 --workers=0 and uncomment this line of code:

https://github.com/dusty-nv/pytorch-ssd/blob/8ed842a408f8c4a8812f430cf8063e0b93a56803/vision/datasets/voc_dataset.py#L76

It will then print out the image ID of each image as it’s loaded. The one right before the exception occurs is the one that causes the error. You can then inspect this image’s XML, and either fix it’s annotations or remove it from the ImageSets.

1 Like

Thank you for your help, I have to fix it.

By the way, I want to ask the other question about how can I get the training result to generate a graph like this.

Thank you.

I’m still having an issue . I’ve been running the debug command for a bit and several files cause it to crash, but nothing in those files stand out to me. Here is the content of one of the files, anything look wrong?

<annotation>
  <folder></folder>
  <filename>car0344.jpg</filename>
  <source>
    <database>Unknown</database>
    <annotation>Unknown</annotation>
    <image>Unknown</image>
  </source>
  <size>
    <width>2304</width>
    <height>1296</height>
    <depth></depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>car</name>
    <truncated>0</truncated>
    <occluded>0</occluded>
    <difficult>0</difficult>
    <bndbox>
      <xmin>69.55</xmin>
      <ymin>80.8</ymin>
      <xmax>671.16</xmax>
      <ymax>364.6</ymax>
    </bndbox>
  </object>
  <object>
    <name>mailbox</name>
    <truncated>0</truncated>
    <occluded>0</occluded>
    <difficult>0</difficult>
    <bndbox>
      <xmin>1010.08</xmin>
      <ymin>49.83</ymin>
      <xmax>1055.71</xmax>
      <ymax>130.74</ymax>
    </bndbox>
  </object>
</annotation>

Hmm on the surface, I don’t see anything wrong with this either. A couple things to check:

  1. Does car0344.jpg exist?
  2. Are car and mailbox in your labels.txt? (with that exact spelling and capitalization)
  3. If you change the bounding box coordinates to be integer-only (no decimal) does that do anything? (floating-point coordinates should be fine, but just wondering)

Otherwise, I’m not exactly sure why the error - you could just remove those files, or you can send me your dataset so I can look into it.

Yeah the images exist and so does the labels (they are used in other images with the same case). Other xml files contain decimals so i know thats not the issue either. I’ve been playing around moving the labels slightly and right now i’m testing removing one label completely. Might just have to remove the images.

Hi @kceleslie, were you able to find what about those XML files was making the error, or did you get the training to run by just removing them? Thanks.

I was not able to figure out what was wrong with the XML files, i ended up just not using them.

Thanks

OK, thanks for letting us know. Hope you were still able to train your model.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.