tlt-train shuffle error

Hi,

I am trying to train SSD Resnet 18 model with custom dataset generated using “tlt-dataset-convert” (I could generate TFRecord successfully). I see the following error:

File “./modulus/processors/tfrecords_iterator.py”, line 80, in init
ValueError: ‘shuffle’ is True while ‘shuffle_buffer_size’ is 0.

From the docs, I don’t see an option in any config file to set shuffle/shuffle_buffer_size.

Here is my train spec file:

ssd_config {
aspect_ratios_global: “[1.0, 2.0, 0.5, 3.0, 1.0/3.0]”
scales: “[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]”
two_boxes_for_ar1: true
clip_boxes: false
loss_loc_weight: 0.8
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: “[0.1, 0.1, 0.2, 0.2]”
arch: “resnet18”
freeze_bn: false
freeze_blocks: 0
}
training_config {
batch_size_per_gpu: 16
num_epochs: 180
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-5
max_learning_rate: 2e-2
soft_start: 0.1
annealing: 0.3
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
batch_size: 32
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
clustering_iou_threshold: 0.6
top_k: 200
}
augmentation_config {
preprocessing {
output_image_width: 1248
output_image_height: 384
output_image_channel: 3
crop_right: 1248
crop_bottom: 384
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 0.7
zoom_max: 1.8
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
dataset_config {
data_sources: {
tfrecords_path: “/workspace/tlt/dataset/pkg-watch/tfrecords/pkg_watch_trainval_new*”
image_directory_path: “/workspace/tlt/dataset/pkg-watch/images”
}
image_extension: “jpg”
target_class_mapping {
key: “person”
value: “person”
}
target_class_mapping {
key: “package”
value: “package”
}
validation_fold: 20
}

I used the below spec file for TFRecords creation:
kitti_config {
root_directory_path: “/workspace/tlt/dataset/pkg-watch”
image_dir_name: “images”
label_dir_name: “labels”
image_extension: “.jpg”
partition_mode: “random”
num_partitions: 2
val_split: 14
num_shards: 10
}
image_directory_path: “/workspace/tlt/dataset/pkg-watch”

Is there anything I can change in these specs to get rid of the error?

Thanks,
Anusha

Hi 010akv,
In your training spec, why did you set

validation_fold: 20

See Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
for a random split tfrecords, force the validation fold index to 0 as the tfrecord is just 2-fold.

That did the trick, thanks!!