Train_ssd.py indices error

kceleslie · November 15, 2021, 12:27am

I’m trying to train a model based off of images i’ve labeled. I had it working before, and added a few hundred more images to it and now i’m getting an error when running the command. Here is the command.

python3 train_ssd.py --dataset-type=voc --data=/mnt/cifs/NAS/ImageTraining/TrainingData/ --model-dir=models/Home --batch-size=4 --workers=2 --epochs=30

And the error

Traceback (most recent call last):
  File "train_ssd.py", line 344, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
    for i, data in enumerate(loader):
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1179, in _next_data
    return self._process_data(data)
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/home/keith/.local/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/keith/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 219, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "/home/keith/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 81, in __getitem__
    image, boxes, labels = self.transform(image, boxes, labels)
  File "/home/keith/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in __call__
    return self.augment(img, boxes, labels)
  File "/home/keith/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in __call__
    img, boxes, labels = t(img, boxes, labels)
  File "/home/keith/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 357, in __call__
    boxes[:, 0::2] = width - boxes[:, 2::-2]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

I’ve seen this error reported in various places but nothing seems to fix the problem for me. Not really sure where to start with this. Any suggestions?

I’ve tried narrowing it down to a specific xml file, which i believe i have, but nothing stands out within it. Looks just like the other files.

AastaLLL · November 15, 2021, 4:03am

Hi,

Which JetPack version do you use?
If it is not the latest v4.6, please remember to checkout the corresponding branch.

Ex. on JetPack 4.5

$ git clone -b L4T-R32.5.0 https://github.com/dusty-nv/jetson-inference.git

Thanks.

user48945 · November 15, 2021, 11:35am

I have the same problem when I training my own Pascal VOC dataset. I don’t know how to fix it because I’m a rookie at coding, machine learning. I’m just following the example in the Hello AI world to implement object detection with Jetson nano. I have to create my own dataset by using CVAT and export it to Pascal VOC 1.1 format, and I have added the label.txt in the folder. So could please tell me how to fix it. Thank you very much.

python3 train_ssd.py --dataset-type=voc --data=data/dog-leash-resize --model-dir=models/dog-leash-resize --batch-size=4 --workers=2 --epochs=10

2021-11-15 11:01:46 - Using CUDA...
2021-11-15 11:01:46 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/dog-leash-resize', dataset_type='voc', datasets=['data/dog-leash-resize'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=10, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-11-15 11:01:46 - Prepare training datasets.
2021-11-15 11:01:46 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'background:0,0,0::', 'dog leash:128,0,0::')
2021-11-15 11:01:46 - Stored labels into file models/dog-leash-resize/labels.txt.
2021-11-15 11:01:46 - Train dataset size: 187
2021-11-15 11:01:46 - Prepare Validation datasets.
2021-11-15 11:01:46 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'background:0,0,0::', 'dog leash:128,0,0::')
2021-11-15 11:01:46 - Validation dataset size: 187
2021-11-15 11:01:46 - Build network.
2021-11-15 11:01:47 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2021-11-15 11:01:47 - Took 0.52 seconds to load the model.
2021-11-15 11:02:01 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2021-11-15 11:02:01 - Uses CosineAnnealingLR scheduler.
2021-11-15 11:02:01 - Start training from epoch 0.
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
warning - image 161 has object with unknown class 'dog leash'
warning - image 60 has object with unknown class 'dog leash'
warning - image 141 has object with unknown class 'dog leash'
warning - image 76 has object with unknown class 'dog leash'
Traceback (most recent call last):
  File "train_ssd.py", line 343, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
    for i, data in enumerate(loader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 425, in reraise
warning - image 12 has object with unknown class 'dog leash'
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataset.py", line 257, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 81, in __getitem__
    image, boxes, labels = self.transform(image, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in __call__
    return self.augment(img, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in __call__
    img, boxes, labels = t(img, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 345, in __call__
    boxes[:, :2] += (int(left), int(top))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

kceleslie · November 15, 2021, 12:02pm

I’m not sure how to check the version i’m on. Git branch shows i’m on master, and i did a git pull on it before posting to ensure i’m up to date.

dusty_nv · November 15, 2021, 4:22pm

Hi @user48945, part of the issue is that you are using the labelmap from CVAT, and not a labels.txt you created like this:

dog leash

Your labels.txt should have one class name per line.

The way to debug this is to run train_ssd.py with --debug-steps=1 --batch-size=1 --workers=0 and uncomment this line of code:

https://github.com/dusty-nv/pytorch-ssd/blob/8ed842a408f8c4a8812f430cf8063e0b93a56803/vision/datasets/voc_dataset.py#L76

It will then print out the image ID of each image as it’s loaded. The one right before the exception occurs is the one that causes the error. You can then inspect this image’s XML, and either fix it’s annotations or remove it from the ImageSets.

user48945 · November 16, 2021, 5:02am

Thank you for your help, I have to fix it.

By the way, I want to ask the other question about how can I get the training result to generate a graph like this.

Thank you.

kceleslie · November 16, 2021, 11:29am

I’m still having an issue . I’ve been running the debug command for a bit and several files cause it to crash, but nothing in those files stand out to me. Here is the content of one of the files, anything look wrong?

<annotation>
  <folder></folder>
  <filename>car0344.jpg</filename>
  <source>
    <database>Unknown</database>
    <annotation>Unknown</annotation>
    <image>Unknown</image>
  </source>
  <size>
    <width>2304</width>
    <height>1296</height>
    <depth></depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>car</name>
    <truncated>0</truncated>
    <occluded>0</occluded>
    <difficult>0</difficult>
    <bndbox>
      <xmin>69.55</xmin>
      <ymin>80.8</ymin>
      <xmax>671.16</xmax>
      <ymax>364.6</ymax>
    </bndbox>
  </object>
  <object>
    <name>mailbox</name>
    <truncated>0</truncated>
    <occluded>0</occluded>
    <difficult>0</difficult>
    <bndbox>
      <xmin>1010.08</xmin>
      <ymin>49.83</ymin>
      <xmax>1055.71</xmax>
      <ymax>130.74</ymax>
    </bndbox>
  </object>
</annotation>

dusty_nv · November 16, 2021, 9:17pm

Hmm on the surface, I don’t see anything wrong with this either. A couple things to check:

Does car0344.jpg exist?
Are car and mailbox in your labels.txt? (with that exact spelling and capitalization)
If you change the bounding box coordinates to be integer-only (no decimal) does that do anything? (floating-point coordinates should be fine, but just wondering)

Otherwise, I’m not exactly sure why the error - you could just remove those files, or you can send me your dataset so I can look into it.

kceleslie · November 18, 2021, 11:32am

Yeah the images exist and so does the labels (they are used in other images with the same case). Other xml files contain decimals so i know thats not the issue either. I’ve been playing around moving the labels slightly and right now i’m testing removing one label completely. Might just have to remove the images.

dusty_nv · November 22, 2021, 5:18pm

Hi @kceleslie, were you able to find what about those XML files was making the error, or did you get the training to run by just removing them? Thanks.

kceleslie · November 22, 2021, 7:10pm

I was not able to figure out what was wrong with the XML files, i ended up just not using them.

Thanks

dusty_nv · November 22, 2021, 7:19pm

OK, thanks for letting us know. Hope you were still able to train your model.

system · December 15, 2021, 4:49am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Train_ssd.py dosen't work with pascal voc dataset Jetson Nano ai-training	5	1120	February 9, 2022
Jetson nano start the Docker an error occurred while training your detection model ：Segmentation fault (core dumped) Jetson Nano jetson-inference	7	1229	April 21, 2022
Training with "train_ssd.py" - error at the end of the dataset Jetson AGX Xavier	6	1244	October 18, 2021
Train_ssd.py - Could not find image warning Jetson Orin Nano jetson-inference	7	113	August 13, 2024
Train_ssd.py error - Training Object Detection Models Jetson Nano ai-training	10	1403	October 6, 2022
Dusty-nv jetson training custom data sets generating labels Jetson Nano ai-training	27	4393	October 15, 2021
Successful training with "train_ssd.py" using small custom data set, but error on full data set Jetson Nano ai-training	6	1803	October 18, 2021
Problems with train_ssd.py Jetson Nano	2	1018	October 14, 2021
Re-training SSD-Mobilenet: gt_locations consist of nan values which causing Regression Loss to NaN Jetson Nano ai-training	2	917	September 13, 2022
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed Jetson Nano jetson-inference	2	2075	December 22, 2022

Train_ssd.py indices error

Related topics