We are using TAO’s MaskRCNN for live image segemtation. For our model, we want to increase the default mrcnn_resolution
(currently set to[28, 28]
) at least by a factor of four, since our postprocessing depends on high-resolution maps.
I was able to increase the mask size to [56, 56]
but not any higher. Everything above 56
gives me a geometry mismatch error. For example mrcnn_resolution: 112
throws the following error:
ValueError: Dimension size must be evenly divisible by 12845056 but is 3211264 for 'mask_postprocess/Reshape' (op: 'Reshape') with input shapes: [1024,4,28,28], [5] and with input tensors computed as partial shapes: input[1] = [?,256,4,112,112].
It seems, that the first of those two tensors has the following dimensions [ batch_size_per_im, batch_size, 28, 28]
. The second tensor has dimensions [n, batch_size, mrcnn_resolution, mrcnn_resolution]
, where n must be an integer. Is this correct? Therefore the question is, which parameters do I need to change as well for achieving a higher mask resolution? Just increasing batch_size_per_im
is no option since it’s also limited (for me at a value of 1100
).
Is there any in-depth documentation or an example of how to change the mask resolution? In the official documentation, no dependencies of mrcnn_resolution
are listed (see Creating an Experiment Spec File — Transfer Learning Toolkit 2.0 documentation).
Additional Info:
• Hardware: Jetson Xavier
• Network Type: Mask_rcnn
• TLT Version: We are training the model using the singularity container.
• Training spec file:
seed: 123 # The random seed for the experiment
use_amp: True # Specifies whether to use Automatic Mixed Precision training
warmup_steps: 100 # The steps taken for learning rate to ramp up to the init_learning_rate
warmup_learning_rate: 0.0001 # The initial learning rate during the warmup phase
# Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning
# rate by k. All other hyper-parameters (weight decay, etc.) are kept un- changed.
total_steps: 10000 # The total number of training iterations
init_learning_rate: 0.01
learning_rate_steps: "[1000, 5000, 8000]" # A list of steps at which the learning rate decays by the factor specified in learning_rate_decay_levels
learning_rate_decay_levels: "[0.1, 0.01, 0.001]" # A list of steps at which the learning rate decays by the factor specified in learning_rate_decay_levels
checkpoint: "./resnet50.hdf5" # The path to a pretrained model
train_batch_size: 4 # The batch size during training
eval_batch_size: 4 # The batch size during validation or evaluation
num_steps_per_eval: 1000 # Save a checkpoint and run evaluation every N steps.
momentum: 0.9 # Momentum of the SGD optimizer
l2_weight_decay: 0.0002 # L1 weight decay
l1_weight_decay: 0.0002 # L2 weight decay
# The input data configuration
data_config {
image_size: "(768, 768)" # The image dimension as a tuple within quote marks. “(height, width)” indicates the dimension of the resized and padded input.
augment_input_data: True #Specifies whether to augment the data
eval_samples: 500 # The number of samples for evaluation
training_file_pattern: "./data_augmented/train*.tfrecord" # The TFRecord path for training
validation_file_pattern: "./data_augmented/val*.tfrecord" # The TFRecord path for validation
val_json_file: "./data_augmented/mixed_test.json" # The annotation file path for validation
# dataset specific parameters
num_classes: 4 # The number of classes. If there are N categories in the annotation, num_classes should be N+1 (background class)
n_workers: 12 # The number of workers to parse and preprocess data (default: 16)
skip_crowd_during_training: True # Specifies whether to skip crowd during training
max_num_instances: 100 # The maximum number of object instances to parse (default: 200)
# prefetch_buffer_size: 4096 # The prefetch buffer size used by tf.data.Dataset (default: AUTOTUNE)
shuffle_buffer_size: 4096 # The shuffle buffer size used by tf.data.Dataset (default: 4096)
}
# The architecture of the model
maskrcnn_config {
nlayers: 50 # The number of layers in ResNet arch
arch: "resnet" # The backbone feature extractor name
freeze_bn: True # Whether to freeze some BatchNorm layers in the backbone (defined with freeze_blocks)
freeze_blocks: "[0,1]" # A list of conv blocks in the backbone to freeze, i.g. the first two layers
gt_mask_size: 224 # The groundtruth mask size
# Region Proposal Network
rpn_positive_overlap: 0.6 # The lower-bound threshold to assign positive labels for anchors
rpn_negative_overlap: 0.3 # The upper-bound threshold to assign negative labels for anchors
rpn_batch_size_per_im: 512 # The number of sampled anchors per image in RPN
rpn_fg_fraction: 0.5 # The desired fraction of positive anchors in a batch
rpn_min_size: 0.5 # The minimum proposal height and width
# Proposal layer.
batch_size_per_im: 1024 # The RoI minibatch size per image
fg_fraction: 0.25 # The target fraction of RoI minibatch that is labeled as foreground
fg_thresh: 0.5
bg_thresh_hi: 0.5
bg_thresh_lo: 0.5
# Faster-RCNN heads.
fast_rcnn_mlp_head_dim: 1024 # The Fast-RCNN classification head dimension
bbox_reg_weights: "(10., 10., 5., 5.)" # The bounding-box regularization weights
# Mask-RCNN heads.
include_mask: True # Specifies whether to include a mask head
mrcnn_resolution: 112 # The mask-head resolution
# training
train_rpn_pre_nms_topn: 2000 # The number of top-scoring RPN proposals to keep before applying NMS (per FPN level) during training
train_rpn_post_nms_topn: 1000 # The number of top-scoring RPN proposals to keep after applying NMS (total number produced) during training
train_rpn_nms_threshold: 0.7 # The NMS IOU threshold in RPN during training
# evaluation
test_detections_per_image: 100 # The number of bounding box candidates after NMS
test_nms: 0.5 # The NMS IOU threshold during test
test_rpn_pre_nms_topn: 1000 # The number of top-scoring RPN proposals to keep before applying NMS (per FPN level) during test
test_rpn_post_nms_topn: 1000 # The number of top scoring RPN proposals to keep after applying NMS (total number produced) during test
test_rpn_nms_thresh: 0.7 # The NMS IOU threshold in RPN during test
# model architecture
min_level: 2 # The minimum level of the output feature pyramid
max_level: 6 # The maximum level of the output feature pyramid
num_scales: 8 # The number of anchor octave scales on each pyramid level (e.g. if set to 3, the anchor scales are [2^0, 2^(1/3), 2^(2/3)])
aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]" # A list of tuples representing the aspect ratios of anchors on each pyramid level: aspect ratios of 1:1, 1:2 and 2:1
anchor_scale: 8 # Scale of the base-anchor size to the feature-pyramid stride
# localization loss
rpn_box_loss_weight: 1.0 # The weight for adjusting RPN box loss in the total loss
fast_rcnn_box_loss_weight: 1.0 # The weight for adjusting FastRCNN box regression loss in the total loss
mrcnn_weight_loss_mask: 1.0 # The weight for adjusting mask loss in the total loss
}