Custom data training failed

jaganathan.comm · August 30, 2022, 3:07pm

I am training my custom dataset for object detection. My training is failed and here i have pasted the error message.

python3 train_ssd.py --dataset-type=voc --data=data/pedestrian2_30Aug22_pascalvoc/ --model=models/pedestrian2_30Aug22_pascalvoc/ --batch-size=4 --epochs=2
ch-size=4 --epochs=2
2022-08-30 19:27:12.638284: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2022-08-30 19:27:22 - Using CUDA…
2022-08-30 19:27:22 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder=‘models/pedestrian2_30Aug22_pascalvoc/’, dataset_type=‘voc’, datasets=[‘data/pedestrian2_30Aug22_pascalvoc/’], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level=‘info’, lr=0.01, mb2_width_mult=1.0, milestones=‘80,100’, momentum=0.9, net=‘mb1-ssd’, num_epochs=2, num_workers=2, pretrained_ssd=‘models/mobilenet-v1-ssd-mp-0_675.pth’, quick_validation=False, resolution=300, resume=None, scheduler=‘cosine’, t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2022-08-30 19:27:43 - model resolution 300x300
2022-08-30 19:27:43 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2022-08-30 19:27:43 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2022-08-30 19:27:43 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2022-08-30 19:27:43 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2022-08-30 19:27:43 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2022-08-30 19:27:43 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2022-08-30 19:27:43 - Prepare training datasets.
warning - image WalkingParisSunset_00001097 has no box/labels annotations, ignoring from dataset
2022-08-30 19:27:44 - VOC Labels read from file: (‘BACKGROUND’, ‘background’, ‘Pedestrian’)
2022-08-30 19:27:44 - Stored labels into file models/pedestrian2_30Aug22_pascalvoc/labels.txt.
2022-08-30 19:27:44 - Train dataset size: 659
2022-08-30 19:27:44 - Prepare Validation datasets.
warning - image WalkingParisSunset_00001097 has no box/labels annotations, ignoring from dataset
2022-08-30 19:27:45 - VOC Labels read from file: (‘BACKGROUND’, ‘background’, ‘Pedestrian’)
2022-08-30 19:27:45 - Validation dataset size: 659
2022-08-30 19:27:45 - Build network.
warning - image WalkingParisSunset_00001097 has no box/labels annotations, ignoring from dataset
2022-08-30 19:27:45 - VOC Labels read from file: (‘BACKGROUND’, ‘background’, ‘Pedestrian’)
2022-08-30 19:27:46 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2022-08-30 19:27:46 - Took 0.58 seconds to load the model.
2022-08-30 19:27:46 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2022-08-30 19:27:46 - Uses CosineAnnealingLR scheduler.
2022-08-30 19:27:46 - Start training from epoch 0.
/home/gbewegung/.local/lib/python3.6/site-packages/torch/nn/_reduction.py:44: UserWarning: size_average and reduce args will be deprecated, please use reduction=‘sum’ instead.
warnings.warn(warning.format(ret))
2022-08-30 19:28:05 - Epoch: 0, Step: 10/165, Avg Loss: 9.3615, Avg Regression Loss 4.4629, Avg Classification Loss: 4.8986
2022-08-30 19:28:10 - Epoch: 0, Step: 20/165, Avg Loss: 6.9237, Avg Regression Loss 3.6438, Avg Classification Loss: 3.2799
2022-08-30 19:28:13 - Epoch: 0, Step: 30/165, Avg Loss: 6.1242, Avg Regression Loss 3.3326, Avg Classification Loss: 2.7916
2022-08-30 19:28:18 - Epoch: 0, Step: 40/165, Avg Loss: 6.8517, Avg Regression Loss 3.7716, Avg Classification Loss: 3.0801
2022-08-30 19:28:21 - Epoch: 0, Step: 50/165, Avg Loss: 5.4286, Avg Regression Loss 2.7843, Avg Classification Loss: 2.6443
2022-08-30 19:28:23 - Epoch: 0, Step: 60/165, Avg Loss: 4.8790, Avg Regression Loss 2.3757, Avg Classification Loss: 2.5033
2022-08-30 19:28:26 - Epoch: 0, Step: 70/165, Avg Loss: 4.8492, Avg Regression Loss 2.3427, Avg Classification Loss: 2.5066
2022-08-30 19:28:28 - Epoch: 0, Step: 80/165, Avg Loss: 4.4427, Avg Regression Loss 2.0119, Avg Classification Loss: 2.4308
2022-08-30 19:28:32 - Epoch: 0, Step: 90/165, Avg Loss: 5.0296, Avg Regression Loss 2.4892, Avg Classification Loss: 2.5403
2022-08-30 19:28:34 - Epoch: 0, Step: 100/165, Avg Loss: 4.5285, Avg Regression Loss 2.0972, Avg Classification Loss: 2.4313
2022-08-30 19:28:36 - Epoch: 0, Step: 110/165, Avg Loss: 3.9718, Avg Regression Loss 1.7432, Avg Classification Loss: 2.2286
2022-08-30 19:28:39 - Epoch: 0, Step: 120/165, Avg Loss: 4.0890, Avg Regression Loss 1.7168, Avg Classification Loss: 2.3722
2022-08-30 19:28:41 - Epoch: 0, Step: 130/165, Avg Loss: 4.3012, Avg Regression Loss 1.8998, Avg Classification Loss: 2.4014
2022-08-30 19:28:44 - Epoch: 0, Step: 140/165, Avg Loss: 4.2805, Avg Regression Loss 1.9802, Avg Classification Loss: 2.3003
2022-08-30 19:28:46 - Epoch: 0, Step: 150/165, Avg Loss: 5.0491, Avg Regression Loss 2.5790, Avg Classification Loss: 2.4701
2022-08-30 19:28:49 - Epoch: 0, Step: 160/165, Avg Loss: 4.3675, Avg Regression Loss 2.1717, Avg Classification Loss: 2.1958
2022-08-30 19:28:51 - Epoch: 0, Training Loss: 5.2401, Training Regression Loss 2.5659, Training Classification Loss: 2.6743
2022-08-30 19:29:02 - Epoch: 0, Validation Loss: 4.0435, Validation Regression Loss 1.7113, Validation Classification Loss: 2.3322
Traceback (most recent call last):
File “train_ssd.py”, line 414, in
mean_ap, class_ap = eval.compute()
File “/home/gbewegung/jetson-inference/python/training/detection/ssd/eval_ssd.py”, line 88, in compute
self.true_case_stat[class_index],
KeyError: 1

dusty_nv · August 30, 2022, 3:50pm

Hi @jaganathan.comm, can you try running train_ssd.py with the --quick-validation flag? Thanks.

jaganathan.comm · August 30, 2022, 4:28pm

Hi Dustin,
It is working. Thanks
May i know that why it is failed when --quick-validation is in default case?

dusty_nv · August 30, 2022, 4:32pm

It appears that your validation dataset didn’t have any instances of at least one of your classes, and that evaluation code didn’t handle that case. That code was added recently and I will have to take another look at it or disable it by default.

dusty_nv · August 30, 2022, 4:51pm

BTW I just disabled the mAP calculation code by default in train_ssd.py until it gets more ironed out - the --quick-validation flag to disable it has been replaced by the --validation-mean-ap flag to enable it.

system · September 21, 2022, 4:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.