Clarification needed for MaskRCNN Config file (image_size, eval_samples, gt_mask_size)


I have questions regarding some config fields for maskRCNN.

In data_config, it’s not clear to me if I need to resize input images to match image_size or resizing is done by TLT, from this blog Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA Transfer Learning Toolkit:

Input images are resized and padded to image_size while keeping the aspect ratio.

To me, this indicates that TLT will resize the input images to match image_size, I also check out and, no resizing takes place before the tf_record conversion so the input image from COCO are not resized in advance.

I just want to double check that TLT will do the resizing (for image, bbox & mask annotation) as part of the training pipeline.

eval_samples = number of samples for evaluation -> is this the number images from the training set to use for evaluation or the size of the valuation set? As an aside question, does the losses print out during training computed from the training set or the validation set?
gt_mask_size = ground truth mask size, would you please explain how do I set this value?

  1. TLT will do the resizing.
  2. The images are from valuation dataset. The loss is computed for valuation dataset.
  3. Suggest keeping this value. The groundtruth masks will be cropped by the bounding box and resized to a fixed size determined by this parameter ‘gt_mask_size’
1 Like

Hi Morganh,

Thank you for the reply! Just to double check:

When TLT resizes the input image, it also resizes/adjusts the annotation (bounding boxes & polygon masks) accordingly right?

Yes, they are included in preprocessing.

1 Like

thank you for the clarification!