I am trying to retrain the tlt_pretrained_object_detection:mobilenet_v1
model with my own KITTI-formatted dataset (1 class, “person”), as per the instructions in the Getting Started Guide and this blog post. I am using the tlt-streamanalytics:v2.0_dp_py2
Docker image for this.
Firstly, I am converting the KITTI dataset into TFrecords with the following command:
tlt-dataset-convert -d convert.spec -o ./tfrecords/converted.tfrecord
and this convert.spec
file:
kitti_config {
root_directory_path: "[REPLACE_WITH_DATASET_DIR]"
image_dir_name: "images"
label_dir_name: "labels"
image_extension: ".png"
partition_mode: "random"
num_partitions: 2
val_split: 20
num_shards: 10
}
image_directory_path: "[REPLACE_WITH_DATASET_DIR]"
The [REPLACE_WITH_DATASET_DIR]
is replaced with my actual dataset directory in my spec file.
The output of that command is:
Using TensorFlow backend.
2020-08-05 20:19:08,221 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2020-08-05 20:19:08,243 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in
Train: 7955 Val: 1988
2020-08-05 20:19:08,243 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2020-08-05 20:19:08,245 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 0
/usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:266: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
2020-08-05 20:19:08,402 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1
2020-08-05 20:19:08,552 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 2
2020-08-05 20:19:08,702 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 3
2020-08-05 20:19:08,853 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 4
2020-08-05 20:19:09,003 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 5
2020-08-05 20:19:09,153 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 6
2020-08-05 20:19:09,303 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 7
2020-08-05 20:19:09,454 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 8
2020-08-05 20:19:09,604 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 9
2020-08-05 20:19:09,760 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
person: 1988
2020-08-05 20:19:09,760 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 0
2020-08-05 20:19:10,363 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 1
2020-08-05 20:19:10,965 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 2
2020-08-05 20:19:11,567 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 3
2020-08-05 20:19:12,170 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 4
2020-08-05 20:19:12,772 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 5
2020-08-05 20:19:13,375 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 6
2020-08-05 20:19:13,978 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 7
2020-08-05 20:19:14,581 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 8
2020-08-05 20:19:15,184 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 9
2020-08-05 20:19:15,791 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
person: 7955
2020-08-05 20:19:15,792 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Cumulative object statistics
2020-08-05 20:19:15,792 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
person: 9943
2020-08-05 20:19:15,792 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map.
Label in GT: Label in tfrecords file
person: person
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.
2020-08-05 20:19:15,792 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Tfrecords generation complete.
After that, I download the model with this command:
ngc registry model download-version nvidia/tlt_pretrained_object_detection:mobilenet_v1 -d ./pretrained_model
Then, I try and train with this command:
tlt-train ssd -e train.spec -r ./pretrained_model --gpus 1 -k $NGC_API_KEY
and this train.spec
file:
training_config {
batch_size_per_gpu: 32
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-9
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
top_k: 200
}
augmentation_config {
preprocessing {
output_image_width: 224
output_image_height: 224
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}
dataset_config {
data_sources: {
tfrecords_path: "[REPLACE_WITH_DATASET_DIR]/tfrecords/*"
image_directory_path: "[REPLACE_WITH_DATASET_DIR]/images"
}
image_extension: "png"
target_class_mapping {
key: "person"
value: "person"
}
validation_fold: 0
}
ssd_config {
aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 0.33]"
aspect_ratios: "[[1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0, 2.0, 0.5, 3.0, 0.33]]"
two_boxes_for_ar1: true
clip_boxes: false
scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
loss_loc_weight: 1.0
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: "[0.1, 0.1, 0.2, 0.2]"
arch: "mobilenet_v1"
freeze_bn: false
}
The [REPLACE_WITH_DATASET_DIR]
is replaced with my actual dataset directory in my spec file.
The output of this command is:
Using TensorFlow backend.
--------------------------------------------------------------------------
[[14866,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 2b629d1e986b
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
2020-08-05 20:27:49,098 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from train.spec
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 8, in <module>
sys.exit(main())
File "./common/magnet_train.py", line 37, in main
File "./ssd/scripts/train.py", line 245, in main
File "./ssd/scripts/train.py", line 96, in run_experiment
File "./ssd/builders/inputs_builder.py", line 51, in __init__
File "./detectnet_v2/dataloader/default_dataloader.py", line 206, in get_dataset_tensors
File "./detectnet_v2/dataloader/default_dataloader.py", line 232, in _generate_images_and_ground_truth_labels
File "./modulus/processors/processors.py", line 227, in __call__
File "./detectnet_v2/dataloader/utilities.py", line 60, in call
File "./modulus/processors/tfrecords_iterator.py", line 143, in process_records
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1508, in split
axis=axis, num_split=num_or_size_splits, value=value, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 8883, in split
"Split", split_dim=axis, value=value, num_split=num_split, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 709, in _apply_op_helper
(key, op_type_name, attr_value.i, attr_def.minimum))
ValueError: Attr 'num_split' of 'Split' Op passed 0 less than minimum 1.
I can’t really tell what’s going on behind the scenes in the TLT code, so I have no idea what this means for me or how to debug it. How can I fix this so I can retrain this model with my own dataset?
Thank you in advance for your help.