Transfer learning on fasterrcnn using resnet10

Hi,
I wanted to run faster rcnn using transfer learning using a pretrained checkpoint in order to convert my images to tf records in the same size as is used to train the resnet10 model ? Also I only have one class . Do I have to map the classes to that in the spec?

Also how do I verify is tfrecords are being converted correctly ? my images are 70GB but tf records I created as 1 shard is 40M. Is there code available for the transfer learning toolkit tfrecord generation ?

TLT provides the tool tlt-dataset-convert to generate tfrecords. During the generation, you can see how many images in each class, and also you can see the completion in the end. The code of tlt-dataset-convert is not available.
For one class, you also need to set the class. Just easy to set as below example.

target_class_mapping {
  key: 'car'
  value: 'car'
}

More details, please see tlt user guide.
https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#conv_tfrecords_topic
https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#spec_file_fasterrcnn

I got the following output. Does this mean it is created properly ?
Using TensorFlow backend.
2020-06-02 22:20:36,093 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2020-06-02 22:20:40,587 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in
Train: 41136 Val: 10283
2020-06-02 22:20:40,587 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2020-06-02 22:20:40,601 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 0
/usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:266: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
2020-06-02 22:20:57,134 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1
2020-06-02 22:21:13,590 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 2
2020-06-02 22:21:29,351 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 3
2020-06-02 22:21:45,347 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 4
2020-06-02 22:22:01,535 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 5
2020-06-02 22:22:17,704 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 6
2020-06-02 22:22:33,721 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 7
2020-06-02 22:22:50,450 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 8
2020-06-02 22:23:06,339 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 9
2020-06-02 22:23:22,598 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
trafficlight: 31074

2020-06-02 22:23:22,598 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 0
2020-06-02 22:24:25,242 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 1
2020-06-02 22:25:29,449 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 2
2020-06-02 22:26:33,777 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 3
2020-06-02 22:27:38,128 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 4
2020-06-02 22:28:43,032 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 5
2020-06-02 22:29:46,436 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 6
2020-06-02 22:30:50,371 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 7
2020-06-02 22:31:54,278 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 8
2020-06-02 22:32:58,015 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 9
2020-06-02 22:34:01,894 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
trafficlight: 124903

2020-06-02 22:34:01,895 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Cumulative object statistics
2020-06-02 22:34:01,895 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
trafficlight: 155977

2020-06-02 22:34:01,895 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map.
Label in GT: Label in tfrecords file
TrafficLight: trafficlight
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

2020-06-02 22:34:01,895 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Tfrecords generation complete.

Also this is the format of the files that I got :
-fold-000-of-002-shard-00000-of-00010 -fold-000-of-002-shard-00004-of-00010 -fold-000-of-002-shard-00008-of-00010 -fold-001-of-002-shard-00002-of-00010 -fold-001-of-002-shard-00006-of-00010
-fold-000-of-002-shard-00001-of-00010 -fold-000-of-002-shard-00005-of-00010 -fold-000-of-002-shard-00009-of-00010 -fold-001-of-002-shard-00003-of-00010 -fold-001-of-002-shard-00007-of-00010
-fold-000-of-002-shard-00002-of-00010 -fold-000-of-002-shard-00006-of-00010 -fold-001-of-002-shard-00000-of-00010 -fold-001-of-002-shard-00004-of-00010 -fold-001-of-002-shard-00008-of-00010
-fold-000-of-002-shard-00003-of-00010 -fold-000-of-002-shard-00007-of-00010 -fold-001-of-002-shard-00001-of-00010 -fold-001-of-002-shard-00005-of-00010 -fold-001-of-002-shard-00009-of-00010

after running the training pipeline I am getting the following output :
“google.protobuf.text_format.ParseError: 61:1 : Message type “DatasetConfig” has no field named “data_augmentation”.”

Here is the spec file i used for training experiement :
random_seed: 42
enc_key: ‘i put my own key here’
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: ‘bgr’
size_height_width {
height: 720
width: 1280
}
image_channel_mean {
key: ‘b’
value: 103.939
}
image_channel_mean {
key: ‘g’
value: 116.779
}
image_channel_mean {
key: ‘r’
value: 123.68
}
image_scaling_factor: 1.0
max_objects_num_per_image: 10
}
feature_extractor: “resnet:10”
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
data_sources: {
tfrecords_path: “/mnt/nfs/nvidia_tlt/tlt-experiments/examples/faster_rcnn/tfrecords/-fold-*”
image_directory_path: “/mnt/nfs/mldata/tlt_data/images/”
}
image_extension: ‘png’
target_class_mapping {
key: ‘trafficlight’
value: ‘1’
}
data_augmentation {
preprocessing {
output_image_width: 1280
output_image_height: 720
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
enable_augmentation: True
batch_size_per_gpu: 1
num_epochs: 12
pretrained_weights: “/mnt/nfs/nvidia_tlt/tlt-experiments/examples/faster_rcnn/resnet10.hdf5”
output_model: “/mnt/nfs/nvidia_tlt/tlt-experiments/examples/faster_rcnn/frcnn_trafficlight_resnet10.tlt”
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: ‘x’
value: 10.0
}
classifier_regr_std {
key: ‘y’
value: 10.0
}
classifier_regr_std {
key: ‘w’
value: 5.0
}
classifier_regr_std {
key: ‘h’
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

reg_config {
reg_type: ‘L2’
weight_decay: 1e-4
}

optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}

lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir: ‘/workspace/tlt-experiments/data/testing/image_2’
model: ‘/workspace/tlt-experiments/data/faster_rcnn/frcnn_kitti_resnet10.epoch12.tlt’
detection_image_output_dir: ‘/workspace/tlt-experiments/data/faster_rcnn/inference_results_imgs’
labels_dump_dir: ‘/workspace/tlt-experiments/data/faster_rcnn/inference_dump_labels’
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
model: ‘/workspace/tlt-experiments/data/faster_rcnn/frcnn_kitti_resnet10.epoch12.tlt’
labels_dump_dir: ‘/workspace/tlt-experiments/data/faster_rcnn/test_dump_labels’
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

Yes, the generation is successful.

For your training error, please modify below and retry.

target_class_mapping {
key: ‘trafficlight’
value: ‘1’
}

to

target_class_mapping {
key: ‘trafficlight’
value: ‘trafficlight’
}

Hi thank you for your reply. I also wanted to ask how to go about using pretrained resnet 10 weights ? If I want to use pretrained weights my tf record generation should match the same input size as that use for training the pretrained weights ?

No needed. But please make sure all your images’ resolution are the same. And it matches the width/height in your training spec.

ok thank you for your reply. I tried changing the value to trafficlight . I am still getting the same error.
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 942, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 1016, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 942, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 1016, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 909, in _MergeField
(message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 61:1 : Message type “DatasetConfig” has no field named “data_augmentation”.

I found the issue my spec was mis aligned
there should be another bracket here
target_class_mapping {
key: ‘trafficlight’
value: ‘1’
}
data_augmentation {
preprocessing {
output_image_width: 1280