Mask rcnn poor results

p.vahidinia · September 20, 2021, 9:48am

I trained 300 images with mask rcnn but the results are very poor. The mAP is up to 40% with other models than TAO. This is my results after 140 iterations:

DLL 2021-09-19 16:26:47.189391 - Iteration: 140 Validation Iteration: 140  AP : 0.017194924876093864
DLL 2021-09-19 16:26:47.189477 - Iteration: 140 Validation Iteration: 140  AP50 : 0.0821227878332138
DLL 2021-09-19 16:26:47.189502 - Iteration: 140 Validation Iteration: 140  AP75 : 0.0016183353727683425
DLL 2021-09-19 16:26:47.189525 - Iteration: 140 Validation Iteration: 140  APs : 0.0
DLL 2021-09-19 16:26:47.189548 - Iteration: 140 Validation Iteration: 140  APm : 1.05388689917163e-05
DLL 2021-09-19 16:26:47.189570 - Iteration: 140 Validation Iteration: 140  APl : 0.029721815139055252
DLL 2021-09-19 16:26:47.189590 - Iteration: 140 Validation Iteration: 140  ARmax1 : 0.03628117963671684
DLL 2021-09-19 16:26:47.189609 - Iteration: 140 Validation Iteration: 140  ARmax10 : 0.07324262708425522
DLL 2021-09-19 16:26:47.189628 - Iteration: 140 Validation Iteration: 140  ARmax100 : 0.11995464563369751
DLL 2021-09-19 16:26:47.189648 - Iteration: 140 Validation Iteration: 140  ARs : 0.0
DLL 2021-09-19 16:26:47.189666 - Iteration: 140 Validation Iteration: 140  ARm : 0.006603773683309555
DLL 2021-09-19 16:26:47.189685 - Iteration: 140 Validation Iteration: 140  ARl : 0.2165975123643875
DLL 2021-09-19 16:26:47.189706 - Iteration: 140 Validation Iteration: 140  mask_AP : 0.0018066676566377282
DLL 2021-09-19 16:26:47.189725 - Iteration: 140 Validation Iteration: 140  mask_AP50 : 0.009883023798465729
DLL 2021-09-19 16:26:47.189745 - Iteration: 140 Validation Iteration: 140  mask_AP75 : 0.00011187559721292928
DLL 2021-09-19 16:26:47.189764 - Iteration: 140 Validation Iteration: 140  mask_APs : 4.041220563522074e-06
DLL 2021-09-19 16:26:47.189784 - Iteration: 140 Validation Iteration: 140  mask_APm : 0.0
DLL 2021-09-19 16:26:47.189803 - Iteration: 140 Validation Iteration: 140  mask_APl : 0.003247086890041828
DLL 2021-09-19 16:26:47.189822 - Iteration: 140 Validation Iteration: 140  mask_ARmax1 : 0.009977323934435844
DLL 2021-09-19 16:26:47.189842 - Iteration: 140 Validation Iteration: 140  mask_ARmax10 : 0.021995464339852333
DLL 2021-09-19 16:26:47.189861 - Iteration: 140 Validation Iteration: 140  mask_ARmax100 : 0.02448979578912258
DLL 2021-09-19 16:26:47.189880 - Iteration: 140 Validation Iteration: 140  mask_ARs : 0.0010638297535479069
DLL 2021-09-19 16:26:47.189899 - Iteration: 140 Validation Iteration: 140  mask_ARm : 0.0
DLL 2021-09-19 16:26:47.189918 - Iteration: 140 Validation Iteration: 140  mask_ARl : 0.044398341327905655

And this is my spec file:

seed: 123
use_amp: False
warmup_steps: 1000
checkpoint: "/workspace/tlt/tlt-experiments/all_segmentation_approaches/pretrained_weights/resnet_50.hdf5"
learning_rate_steps: "[15000, 25000]"
learning_rate_decay_levels: "[0.1, 0.01]"
total_steps: 120000
train_batch_size: 1
eval_batch_size: 1
num_steps_per_eval: 5
momentum: 0.9
l2_weight_decay: 0.0001
warmup_learning_rate: 0.001
init_learning_rate: 0.0025

data_config{
    image_size: "(448, 448)"
    augment_input_data: False
    eval_samples: 135
    training_file_pattern: "/workspace/tlt/tlt-experiments/all_segmentation_approaches/corrosion_v2_temp_mask/tfrecords/train/*.tfrecord"
    validation_file_pattern: "/workspace/tlt/tlt-experiments/all_segmentation_approaches/corrosion_v2_temp_mask/tfrecords/val/*.tfrecord"
    val_json_file: "/workspace/tlt/tlt-experiments/all_segmentation_approaches/corrosion_v2_temp_mask/val/val_coco.json"

    # dataset specific parameters
    num_classes: 2
    skip_crowd_during_training: True
}

maskrcnn_config {
    nlayers: 50
    arch: "resnet"
    freeze_bn: False
    #freeze_blocks: "[0,1]"
    gt_mask_size: 112
        
    # Region Proposal Network
    rpn_positive_overlap: 0.7
    rpn_negative_overlap: 0.3
    rpn_batch_size_per_im: 256
    rpn_fg_fraction: 0.5
    rpn_min_size: 0.

    # Proposal layer.
    batch_size_per_im: 512
    fg_fraction: 0.25
    fg_thresh: 0.5
    bg_thresh_hi: 0.5
    bg_thresh_lo: 0.

    # Faster-RCNN heads.
    fast_rcnn_mlp_head_dim: 1024
    bbox_reg_weights: "(10., 10., 5., 5.)"

    # Mask-RCNN heads.
    include_mask: True
    mrcnn_resolution: 28

    # training
    train_rpn_pre_nms_topn: 2000
    train_rpn_post_nms_topn: 1000
    train_rpn_nms_threshold: 0.7

    # evaluation
    test_detections_per_image: 100
    test_nms: 0.5
    test_rpn_pre_nms_topn: 1000
    test_rpn_post_nms_topn: 1000
    test_rpn_nms_thresh: 0.7

    # model architecture
    min_level: 2
    max_level: 6
    num_scales: 1
    aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
    anchor_scale: 8

    # localization loss
    rpn_box_loss_weight: 1.0
    fast_rcnn_box_loss_weight: 1.0
    mrcnn_weight_loss_mask: 1.0
}

Morganh · September 20, 2021, 1:51pm

For your case, the 140th iteration is the very beginning of training. Please wait for the further result during training.
More reference: Poor metric results after retraining maskrcnn using TLT notebook - #16 by ghazni

p.vahidinia · September 20, 2021, 7:56pm

Thank you.
So are all the hyperparameters correct for 1 gpu?

Morganh · September 21, 2021, 3:24am

That reference is talking about training COCO dataset. For your case, it is different. You are training your own images but only 300 images. Although it is a bit less, you can still train with its spec and monitor the loss and AP.

Topic		Replies	Views
Problem in train Mask Rcnn TAO Toolkit	9	859	October 7, 2021
Poor metric results after retraining maskrcnn using TLT notebook TAO Toolkit	23	2412	August 3, 2021
Maskrcnn.ipynb - followed notebook and ended up with poor (almost untrained) network from instructions TAO Toolkit	13	760	October 12, 2021
Poor performance of MaskRCNN on images TAO Toolkit	16	1331	October 12, 2021
Basic questions about transfer learning with TAO Toolkit TAO Toolkit	2	465	January 12, 2023
Model performance is terrible when using 8 gpus TAO Toolkit	4	363	October 12, 2021
Training from scratch using TAO for maskrcnn TAO Toolkit	12	720	January 16, 2023
Save maskrcnn model in hdf5 format TAO Toolkit	3	14	August 16, 2024
Retraining Error after pruning the Mask RCNN model with TAO Toolkit TAO Toolkit tao	5	509	May 10, 2022
[Mask RCNN] How to change the resolution of the mask? TAO Toolkit	9	1663	December 24, 2021

Mask rcnn poor results

Related topics