Problem in train Mask Rcnn

p.vahidinia · August 1, 2021, 6:48am

Hi.
I tried to train mask rcnn with my custom dataset. I set the total_steps to 36000 but the train process ends successfully after 1525 iteration. Does the mask rcnn use early stopping?
and one more question, I repeated the train several times but the mAP is always around 0.05. I am using 1 gpu and this is my spec file.

seed: 123
use_amp: False
warmup_steps: 25000
checkpoint: "/workspace/tlt/tlt-experiments/all_segmentation_approaches/pretrained_weights/resnet50.hdf5"
learning_rate_steps: "[18000, 27000]"
learning_rate_decay_levels: "[0.1, 0.01]"
total_steps: 36000
train_batch_size: 2
eval_batch_size: 2
num_steps_per_eval: 5
momentum: 0.9
l2_weight_decay: 0.00001
warmup_learning_rate: 0.00001
init_learning_rate: 0.0025

data_config{
    image_size: "(448, 448)"
    augment_input_data: True
    eval_samples: 48
    training_file_pattern: "/workspace/tlt/tlt-experiments/all_segmentation_approaches/weld_dataset_temp/tfrecords/train/*.tfrecord"
    validation_file_pattern: "/workspace/tlt/tlt-experiments/all_segmentation_approaches/weld_dataset_temp/tfrecords/val/*.tfrecord"
    val_json_file: "/workspace/tlt/tlt-experiments/all_segmentation_approaches/weld_dataset_temp/val/val_coco.json"

    # dataset specific parameters
    num_classes: 4
    skip_crowd_during_training: True
}

maskrcnn_config {
    nlayers: 50
    arch: "resnet"
    freeze_bn: False
    #freeze_blocks: "[0,1]"
    gt_mask_size: 112
        
    # Region Proposal Network
    rpn_positive_overlap: 0.7
    rpn_negative_overlap: 0.3
    rpn_batch_size_per_im: 256
    rpn_fg_fraction: 0.5
    rpn_min_size: 0.

    # Proposal layer.
    batch_size_per_im: 512
    fg_fraction: 0.25
    fg_thresh: 0.5
    bg_thresh_hi: 0.5
    bg_thresh_lo: 0.

    # Faster-RCNN heads.
    fast_rcnn_mlp_head_dim: 1024
    bbox_reg_weights: "(10., 10., 5., 5.)"

    # Mask-RCNN heads.
    include_mask: True
    mrcnn_resolution: 28

    # training
    train_rpn_pre_nms_topn: 2000
    train_rpn_post_nms_topn: 1000
    train_rpn_nms_threshold: 0.7

    # evaluation
    test_detections_per_image: 100
    test_nms: 0.5
    test_rpn_pre_nms_topn: 1000
    test_rpn_post_nms_topn: 1000
    test_rpn_nms_thresh: 0.7

    # model architecture
    min_level: 2
    max_level: 6
    num_scales: 1
    aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
    anchor_scale: 8

    # localization loss
    rpn_box_loss_weight: 1.0
    fast_rcnn_box_loss_weight: 1.0
    mrcnn_weight_loss_mask: 1.0
}

and this is my training log:
mask_rcnn_log.docx (515.8 KB)

Morganh · August 1, 2021, 3:49pm

Could you try a smaller warmup_steps ?
For example,
warmup_steps: 1000

p.vahidinia · August 2, 2021, 4:29am

Hi Morganh
Thanks for the help.
I tried but the result is the same.

Morganh · August 2, 2021, 7:12am

To narrow down, can you run Fater_rcnn network against your custom dataset to check if it can get expected mAP?

p.vahidinia · August 4, 2021, 4:23am

Hi.
I trained faster rcnn and mAP becam 0.21

Morganh · August 4, 2021, 10:42am

Seems that the mAP is a bit low. You can run inference to double check.
How many images did you train?

For Maskrcnn, suggest you run with more total_steps.

p.vahidinia · August 7, 2021, 6:14am

Hi
1- The result of inference is not good and for most images no object is detected.
Also I realized that faster rcnn does not write label name and confidence score on inference images. but other models do.
2- I trained 194 images.
3- I set the totaol steps=36000, but the train process ends successfully after 1525 iterations.

Morganh · August 7, 2021, 2:12pm

There are available in the labels_dump_dir. In each file, they contain bboxes,confidence score, class_name.
Suggest to increase more images for training faster_rcnn or Mask_rcnn.

p.vahidinia · August 8, 2021, 12:15pm

Thank you

system · October 7, 2021, 12:16pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.