[Mask RCNN] How to change the resolution of the mask?

We are using TAO’s MaskRCNN for live image segemtation. For our model, we want to increase the default mrcnn_resolution (currently set to[28, 28]) at least by a factor of four, since our postprocessing depends on high-resolution maps.

I was able to increase the mask size to [56, 56] but not any higher. Everything above 56 gives me a geometry mismatch error. For example mrcnn_resolution: 112 throws the following error:

 ValueError: Dimension size must be evenly divisible by 12845056 but is 3211264 for 'mask_postprocess/Reshape' (op: 'Reshape') with input shapes: [1024,4,28,28], [5] and with input tensors computed as partial shapes: input[1] = [?,256,4,112,112].

It seems, that the first of those two tensors has the following dimensions [ batch_size_per_im, batch_size, 28, 28]. The second tensor has dimensions [n, batch_size, mrcnn_resolution, mrcnn_resolution], where n must be an integer. Is this correct? Therefore the question is, which parameters do I need to change as well for achieving a higher mask resolution? Just increasing batch_size_per_im is no option since it’s also limited (for me at a value of 1100).

Is there any in-depth documentation or an example of how to change the mask resolution? In the official documentation, no dependencies of mrcnn_resolution are listed (see Creating an Experiment Spec File — Transfer Learning Toolkit 2.0 documentation).

Additional Info:

• Hardware: Jetson Xavier
• Network Type: Mask_rcnn
• TLT Version: We are training the model using the singularity container.
• Training spec file:

seed: 123 # The random seed for the experiment
use_amp: True # Specifies whether to use Automatic Mixed Precision training

warmup_steps: 100 # The steps taken for learning rate to ramp up to the init_learning_rate
warmup_learning_rate: 0.0001 # The initial learning rate during the warmup phase

# Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning
# rate by k. All other hyper-parameters (weight decay, etc.) are kept un- changed.
total_steps: 10000 # The total number of training iterations
init_learning_rate: 0.01
learning_rate_steps: "[1000, 5000, 8000]" # A list of steps at which the learning rate decays by the factor specified in learning_rate_decay_levels
learning_rate_decay_levels: "[0.1, 0.01, 0.001]" # A list of steps at which the learning rate decays by the factor specified in learning_rate_decay_levels

checkpoint: "./resnet50.hdf5" # The path to a pretrained model

train_batch_size: 4 # The batch size during training
eval_batch_size: 4 # The batch size during validation or evaluation

num_steps_per_eval: 1000 # Save a checkpoint and run evaluation every N steps.

momentum: 0.9 # Momentum of the SGD optimizer
l2_weight_decay: 0.0002 # L1 weight decay
l1_weight_decay: 0.0002 # L2 weight decay


# The input data configuration
data_config {
        image_size: "(768, 768)" # The image dimension as a tuple within quote marks. “(height, width)” indicates the dimension of the resized and padded input.
        augment_input_data: True #Specifies whether to augment the data

        eval_samples: 500 # The number of samples for evaluation

        training_file_pattern: "./data_augmented/train*.tfrecord" # The TFRecord path for training
        validation_file_pattern: "./data_augmented/val*.tfrecord" # The TFRecord path for validation
        val_json_file: "./data_augmented/mixed_test.json" # The annotation file path for validation

        # dataset specific parameters
        num_classes: 4 # The number of classes. If there are N categories in the annotation, num_classes should be N+1 (background class)
        n_workers: 12 # The number of workers to parse and preprocess data (default: 16)
        skip_crowd_during_training: True # Specifies whether to skip crowd during training
        max_num_instances: 100 # The maximum number of object instances to parse (default: 200)

        # prefetch_buffer_size: 4096 # The prefetch buffer size used by tf.data.Dataset (default: AUTOTUNE)
        shuffle_buffer_size: 4096 # The shuffle buffer size used by tf.data.Dataset (default: 4096)
}

# The architecture of the model
maskrcnn_config {

        nlayers: 50 # The number of layers in ResNet arch
        arch: "resnet" # The backbone feature extractor name
        freeze_bn: True # Whether to freeze some BatchNorm layers in the backbone (defined with freeze_blocks)
        freeze_blocks: "[0,1]" # A list of conv blocks in the backbone to freeze, i.g. the first two layers
        gt_mask_size: 224 # The groundtruth mask size

        # Region Proposal Network
        rpn_positive_overlap: 0.6 # The lower-bound threshold to assign positive labels for anchors
        rpn_negative_overlap: 0.3 # The upper-bound threshold to assign negative labels for anchors
        rpn_batch_size_per_im: 512 # The number of sampled anchors per image in RPN
        rpn_fg_fraction: 0.5 # The desired fraction of positive anchors in a batch
        rpn_min_size: 0.5 # The minimum proposal height and width

        # Proposal layer.
        batch_size_per_im: 1024 # The RoI minibatch size per image
        fg_fraction: 0.25 # The target fraction of RoI minibatch that is labeled as foreground
        fg_thresh: 0.5
        bg_thresh_hi: 0.5
        bg_thresh_lo: 0.5

        # Faster-RCNN heads.
        fast_rcnn_mlp_head_dim: 1024 # The Fast-RCNN classification head dimension
        bbox_reg_weights: "(10., 10., 5., 5.)" # The bounding-box regularization weights

        # Mask-RCNN heads.
        include_mask: True # Specifies whether to include a mask head
        mrcnn_resolution: 112 # The mask-head resolution

        # training
        train_rpn_pre_nms_topn: 2000 # The number of top-scoring RPN proposals to keep before applying NMS (per FPN level) during training
        train_rpn_post_nms_topn: 1000 # The number of top-scoring RPN proposals to keep after applying NMS (total number produced) during training
        train_rpn_nms_threshold: 0.7 # The NMS IOU threshold in RPN during training

        # evaluation
        test_detections_per_image: 100 # The number of bounding box candidates after NMS
        test_nms: 0.5 # The NMS IOU threshold during test
        test_rpn_pre_nms_topn: 1000 # The number of top-scoring RPN proposals to keep before applying NMS (per FPN level) during test
        test_rpn_post_nms_topn: 1000 # The number of top scoring RPN proposals to keep after applying NMS (total number produced) during test
        test_rpn_nms_thresh: 0.7 # The NMS IOU threshold in RPN during test

        # model architecture
        min_level: 2 # The minimum level of the output feature pyramid
        max_level: 6 # The maximum level of the output feature pyramid
        num_scales: 8 # The number of anchor octave scales on each pyramid level (e.g. if set to 3, the anchor scales are [2^0, 2^(1/3), 2^(2/3)])
        aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]" # A list of tuples representing the aspect ratios of anchors on each pyramid level: aspect ratios of 1:1, 1:2 and 2:1
        anchor_scale: 8 # Scale of the base-anchor size to the feature-pyramid stride

        # localization loss
        rpn_box_loss_weight: 1.0 # The weight for adjusting RPN box loss in the total loss
        fast_rcnn_box_loss_weight: 1.0 # The weight for adjusting FastRCNN box regression loss in the total loss
        mrcnn_weight_loss_mask: 1.0 # The weight for adjusting mask loss in the total loss
}

The mask_rcnnmodel has the following two outputs:

  • generate_detections : A [batchSize, keepTopK, C*6] tensor containing the bounding box, class id, score
  • mask_head/mask_fcn_logits/BiasAdd : A [batchSize, keepTopK, C+1, 28*28] tensor containing the masks

You can also run polygraphy to inspect the mask_rcnn tensort engine to confirm.

$ python -m pip install colored
$ python -m pip install polygraphy --index-url https://pypi.ngc.nvidia.com
$ polygraphy inspect model maskrcnn.engine

Thanks for your clarification. But for me, it’s still unclear how to change the mask resolution using the parameters as defined in the training spec.txt. Is this even possible or is [28, 28] a fixed value?

It should be configurable. Did you ever try the default jupyter notebook?

Yes, I did. It’s working with a value of 28 or 56 but not with any higher value. Everything above gives me the following Error, e.g. for setting mrcnn_resolution to 112:

 ValueError: Dimension size must be evenly divisible by 6422528 but is 1605632 for 'mask_postprocess/Reshape' (op: 'Reshape') with input shapes: [512,4,28,28], [5] and with input tensors computed as partial shapes: input[1] = [?,128,4,112,112].

After further checking, unfortunately we find that only the mrcnn_resolution size of 28 is supported.
We will change that in next release. Thanks for catching this issue.

Does this mean you are planning to support other resolutions as well? Or will you just update the documentation?

Yes, we will plan to support other mrcnn_resolution in the next release.

1 Like

BTW, please note that the input size can be configured. See the requirement in MaskRCNN — TAO Toolkit 3.21.11 documentation

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.