I’m following the instructions here for training a custom object detection neural network using jetson-inference: https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-collect-detection.md
I created a custom dataset using CVAT and exported it in PASCAL-VOC format, with the images included. I am trying to get the network to identify pieces of wood.
I mounted a SWAP file and disabled the desktop GUI. Then I navigate to jetson-inference/python/training/detection/ssd and run the following command:
python3 train_ssd.py --dataset-type=voc --data=data/wood --model-dir=models/wood --batch-size=2 --workers=0 --epochs=1
This results in the following output:
2022-02-08 11:43:48 - Using CUDA…
2022-02-08 11:43:48 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=2, checkpoint_folder=‘models/wood’, dataset_type=‘voc’, datasets=[‘data/wood’], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones=‘80,100’, momentum=0.9, net=‘mb1-ssd’, num_epochs=1, num_workers=0, pretrained_ssd=‘models/mobilenet-v1-ssd-mp-0_675.pth’, resume=None, scheduler=‘cosine’, t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2022-02-08 11:43:48 - Prepare training datasets.
2022-02-08 11:43:48 - VOC Labels read from file: (‘BACKGROUND’, ‘wood’)
2022-02-08 11:43:48 - Stored labels into file models/wood/labels.txt.
2022-02-08 11:43:48 - Train dataset size: 119
2022-02-08 11:43:48 - Prepare Validation datasets.
2022-02-08 11:43:49 - VOC Labels read from file: (‘BACKGROUND’, ‘wood’)
2022-02-08 11:43:49 - Validation dataset size: 119
2022-02-08 11:43:49 - Build network.
2022-02-08 11:43:49 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2022-02-08 11:43:49 - Took 0.54 seconds to load the model.
2022-02-08 11:44:04 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2022-02-08 11:44:04 - Uses CosineAnnealingLR scheduler.
2022-02-08 11:44:04 - Start training from epoch 0.
Afterward, nothing happens for several minutes, and I usually get a user warning about lr_scheduler.step()
being called before optimizer.step()
.
Then the program dies with either a memory error, a segmentation fault, or without any specified error. Whatever the case, the result is the same: I don’t get a model.
I tried downloading my data set again in case it was corrupted, but that did not fix it either. Any help into this would be greatly appreciated