Can you finetune some hyper-parameters?
Your spec is as below. The total_steps is 20000 but learning_rate_steps is “[20000, 35000, 45000]”. So, the learning_rate does not change during the training after warmup. Please add more steps and also try to finetune others. Reference: Poor metric results after retraining maskrcnn using TLT notebook - #17 by Morganh
One more question, why did you train 576x832 model? I’m afraid your images are mostly 545x800, right? So, is it better to train a 544x800 model?
seed: 1234
use_amp: False
warmup_steps: 1000
checkpoint: “/workspace/server/tlt-experiments/maskrcnn/pretrained_resnet50/tlt_instance_segmentation_vresnet50/resnet50.hdf5”
learning_rate_steps: “[20000, 35000, 45000]”
learning_rate_decay_levels: “[0.01, 0.02, 0.01]”
total_steps: 20000
train_batch_size: 4
eval_batch_size: 4
num_steps_per_eval: 1000
momentum: 0.3
l2_weight_decay: 0.0001
warmup_learning_rate: 0.0005
init_learning_rate: 0.005data_config{
image_size: “(832, 576)”
augment_input_data: True
eval_samples: 50
training_file_pattern: “/workspace/server/tlt-experiments/IRUV/results/train*.tfrecord*”
validation_file_pattern: “/workspace/server/tlt-experiments/IRUV/results/val*.tfrecord*”
val_json_file: “/workspace/server/tlt-experiments/IRUV/raw-data/val/IRUV_val_v1.json”# dataset specific parameters num_classes: 1 skip_crowd_during_training: True}
maskrcnn_config {
nlayers: 50
arch: “resnet”
freeze_bn: True
freeze_blocks: “[0,1]”
gt_mask_size: 112# Region Proposal Network rpn_positive_overlap: 0.7 rpn_negative_overlap: 0.3 rpn_batch_size_per_im: 256 rpn_fg_fraction: 0.5 rpn_min_size: 0. # Proposal layer. batch_size_per_im: 512 fg_fraction: 0.25 fg_thresh: 0.5 bg_thresh_hi: 0.5 bg_thresh_lo: 0. # Faster-RCNN heads. fast_rcnn_mlp_head_dim: 1024 bbox_reg_weights: "(10., 10., 5., 5.)" # Mask-RCNN heads. include_mask: True mrcnn_resolution: 28 # training train_rpn_pre_nms_topn: 2000 train_rpn_post_nms_topn: 1000 train_rpn_nms_threshold: 0.7 # evaluation test_detections_per_image: 100 test_nms: 0.5 test_rpn_pre_nms_topn: 1000 test_rpn_post_nms_topn: 1000 test_rpn_nms_thresh: 0.7 # model architecture min_level: 2 max_level: 6 num_scales: 1 aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]" anchor_scale: 8 # localization loss rpn_box_loss_weight: 1.0 fast_rcnn_box_loss_weight: 1.0 mrcnn_weight_loss_mask: 1.0}