Just to make sure we are on same page. The key goal of TLT is to train and fine-tune a model using the user’s own dataset. At the moment I am not using any of the my own dataset. I am trying to reproduce the results published in the article. I have rerun the training to make sure I am running the exact setup. However unfortunately I have only achieved 66% of the performance published in blog (https://developer.nvidia.com/blog/training-instance-segmentation-models-using-maskrcnn-on-the-transfer-learning-toolkit/)
I could try other learning rates but we need to be sure that it is best thing to do next. As of now it makes sense to follow the instructions of the author so followed the exact steps. 0.02 learning rate is for 8 GPUs (Nvidia’s Setup) and in my setup there are 2 GPUs then learning rate must be 0.005 [i.e. (0.02/8)*2 following a linear scaling rule]
Final metrics I have after 100K iterations are:
[MaskRCNN] INFO : AP: 0.211550236
[MaskRCNN] INFO : AP50: 0.383451581
[MaskRCNN] INFO : AP75: 0.209822893
[MaskRCNN] INFO : APl: 0.291358769
[MaskRCNN] INFO : APm: 0.219301566
[MaskRCNN] INFO : APs: 0.096314505
[MaskRCNN] INFO : ARl: 0.535470128
[MaskRCNN] INFO : ARm: 0.410021156
[MaskRCNN] INFO : ARmax1: 0.225859717
[MaskRCNN] INFO : ARmax10: 0.370030344
[MaskRCNN] INFO : ARmax100: 0.388801515
[MaskRCNN] INFO : ARs: 0.209091455
[MaskRCNN] INFO : mask_AP: 0.202502176
[MaskRCNN] INFO : mask_AP50: 0.355926216
[MaskRCNN] INFO : mask_AP75: 0.206854612
[MaskRCNN] INFO : mask_APl: 0.293817312
[MaskRCNN] INFO : mask_APm: 0.210412666
[MaskRCNN] INFO : mask_APs: 0.082449332
[MaskRCNN] INFO : mask_ARl: 0.514913499
[MaskRCNN] INFO : mask_ARm: 0.385058582
[MaskRCNN] INFO : mask_ARmax1: 0.220792443
[MaskRCNN] INFO : mask_ARmax10: 0.347017348
[MaskRCNN] INFO : mask_ARmax100: 0.362540752
[MaskRCNN] INFO : mask_ARs: 0.185705140
I have also noticed that results did not oscillated and sounded like converged asymptotically e.g. below are AP values after each 10000 iterations:
[MaskRCNN] INFO : AP: 0.057683785 (10K iterations)
[MaskRCNN] INFO : AP: 0.098032616 (20K iterations)
[MaskRCNN] INFO : AP: 0.125270441 (30K iterations)
[MaskRCNN] INFO : AP: 0.148156375 (40K iterations)
[MaskRCNN] INFO : AP: 0.160243452 (50K iterations)
[MaskRCNN] INFO : AP: 0.169225559 (60K iterations)
[MaskRCNN] INFO : AP: 0.203981400 (70K iterations)
[MaskRCNN] INFO : AP: 0.207072541 (80K iterations)
[MaskRCNN] INFO : AP: 0.211082965 (90K iterations)
[MaskRCNN] INFO : AP: 0.211550236 (100K iterations)
The metric published in blog:
AP: 0.334154785
The contents of spec file (maskrcnn_train_resnet50.txt) are:
seed: 123
use_amp: False
warmup_steps: 0
checkpoint: “/workspace/tlt-experiments/maskrcnn/model/pretrained_resnet50/tlt_instance_segmentation_vresnet50/resnet50.hdf5”
learning_rate_steps: “[60000, 80000, 90000]”
learning_rate_decay_levels: “[0.1, 0.01, 0.001]”
total_steps: 100000
train_batch_size: 2
eval_batch_size: 8
num_steps_per_eval: 10000
momentum: 0.9
l2_weight_decay: 0.00002
warmup_learning_rate: 0.0001
init_learning_rate: 0.005
data_config{
image_size: “(832, 1344)”
augment_input_data: True
eval_samples: 5000
training_file_pattern: “/workspace/tlt-experiments/maskrcnn/data/train*.tfrecord”
validation_file_pattern: “/workspace/tlt-experiments/maskrcnn/data/val*.tfrecord”
val_json_file: “/workspace/tlt-experiments/maskrcnn/data/annotations/instances_val2017.json”
# dataset specific parameters
num_classes: 91
skip_crowd_during_training: True
}
maskrcnn_config {
nlayers: 50
arch: “resnet”
freeze_bn: True
freeze_blocks: “[0,1]”
gt_mask_size: 112
# Region Proposal Network
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_min_size: 0.
# Proposal layer.
batch_size_per_im: 512
fg_fraction: 0.25
fg_thresh: 0.5
bg_thresh_hi: 0.5
bg_thresh_lo: 0.
# Faster-RCNN heads.
fast_rcnn_mlp_head_dim: 1024
bbox_reg_weights: "(10., 10., 5., 5.)"
# Mask-RCNN heads.
include_mask: True
mrcnn_resolution: 28
# training
train_rpn_pre_nms_topn: 2000
train_rpn_post_nms_topn: 1000
train_rpn_nms_threshold: 0.7
# evaluation
test_detections_per_image: 100
test_nms: 0.5
test_rpn_pre_nms_topn: 1000
test_rpn_post_nms_topn: 1000
test_rpn_nms_thresh: 0.7
# model architecture
min_level: 2
max_level: 6
num_scales: 1
aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
anchor_scale: 8
# localization loss
rpn_box_loss_weight: 1.0
fast_rcnn_box_loss_weight: 1.0
mrcnn_weight_loss_mask: 1.0
}