Hey everyone, I am trying to run RetinaNet on my own dataset. I was building off of the example in the TLT container but when I go to start training I am presented with the following output:
Using TensorFlow backend.
2020-07-28 19:52:23,547 [INFO] iva.retinanet.scripts.train: Loading experiment spec at /experiments/retinanet/jobs/resnet50_2020-07-28_15-30-36/specs/retinanet_train.txt.
2020-07-28 19:52:23,547 [INFO] /usr/local/lib/python2.7/dist-packages/iva/retinanet/utils/spec_loader.pyc: Merging specification from /experiments/retinanet/jobs/resnet50_2020-07-28_15-30-36/specs/retinanet_train.txt
2020-07-28 19:52:23,550 [INFO] iva.retinanet.scripts.train: Building model from spec file...
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 8, in <module>
sys.exit(main())
File "./common/magnet_train.py", line 40, in main
File "./retinanet/scripts/train.py", line 247, in main
File "./retinanet/scripts/train.py", line 109, in run_experiment
File "./retinanet/builders/model_builder.py", line 65, in build
File "./retinanet/architecture/retinanet.py", line 241, in retinanet
File "./retinanet/models/fpn.py", line 88, in generate
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 431, in __call__
self.build(unpack_singleton(input_shapes))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/merge.py", line 91, in build
shape)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/merge.py", line 61, in _compute_elemwise_op_output_shape
str(shape1) + ' ' + str(shape2))
ValueError: Operands could not be broadcast together with shapes (256, 46, 80) (256, 45, 80)
Could anyone provide some insight as to why I may be getting this issue? I have been able to run YOLO and DetectNet models fine with the same dataset.
Below is my spec file for training:
retinanet_config {
aspect_ratios_global: "[1.0, 2.0, 0.5]"
scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7]"
two_boxes_for_ar1: false
clip_boxes: false
loss_loc_weight: 0.8
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: "[0.1, 0.1, 0.2, 0.2]"
arch: "resnet"
nlayers: 50
n_kernels: 1
feature_size: 256
freeze_bn: false
}
training_config {
batch_size_per_gpu: 24
num_epochs: 100
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 4e-5
max_learning_rate: 1.5e-2
soft_start: 0.15
annealing: 0.5
}
}
regularizer {
type: L1
weight: 2e-6
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
batch_size: 32
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
clustering_iou_threshold: 0.6
top_k: 200
}
augmentation_config {
preprocessing {
output_image_width: 1280
output_image_height: 720
output_image_channel: 3
crop_right: 1280
crop_bottom: 720
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 0.7
zoom_max: 1.8
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}
dataset_config {
data_sources: {
tfrecords_path: "/experiments/retinanet/jobs/resnet50_2020-07-28_15-30-36/tfrecords/train/*"
image_directory_path: "/datasets/digital-twins/variance_experiment/set_02/5000_random_samples/"
}
image_extension: "png"
target_class_mapping {
key: "m1"
value: "m1"
}
target_class_mapping {
key: "m2"
value: "m2"
}
target_class_mapping {
key: "leopard"
value: "leopard"
}
validation_fold: 0
}