Basic questions about transfer learning with TAO Toolkit

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) GeForce RTX3050
• Network Type Mask_rcnn
• TLT Version

dockers: 
        nvidia/tao/tao-toolkit: 
                4.0.0-tf2.9.1: 
                        docker_registry: nvcr.io
                        tasks: 
                                1. classification_tf2
                                2. efficientdet_tf2
                4.0.0-tf1.15.5: 
                        docker_registry: nvcr.io
                        tasks: 
                                1. augment
                                2. bpnet
                                3. classification_tf1
                                4. detectnet_v2
                                5. dssd
                                6. emotionnet
                                7. efficientdet_tf1
                                8. faster_rcnn
                                9. fpenet
                                10. gazenet
                                11. gesturenet
                                12. heartratenet
                                13. lprnet
                                14. mask_rcnn
                                15. multitask_classification
                                16. retinanet
                                17. ssd
                                18. unet
                                19. yolo_v3
                                20. yolo_v4
                                21. yolo_v4_tiny
                                22. converter
                4.0.0-pyt: 
                        docker_registry: nvcr.io
                        tasks: 
                                1. action_recognition
                                2. deformable_detr
                                3. segformer
                                4. re_identification
                                5. pointpillars
                                6. pose_classification
                                7. n_gram
                                8. speech_to_text
                                9. speech_to_text_citrinet
                                10. speech_to_text_conformer
                                11. spectro_gen
                                12. vocoder
                                13. text_classification
                                14. question_answering
                                15. token_classification
                                16. intent_slot_classification
                                17. punctuation_and_capitalization
format_version: 2.0
toolkit_version: 4.0.0
published_date: 12/08/2022

• Training spec file

# https://docs.nvidia.com/tao/tao-toolkit/text/instance_segmentation/mask_rcnn.html#id6
seed: 123
use_amp: False
warmup_steps: 0
checkpoint: "/workspace/models/peoplesegnet_vtrainable_v2.1/peoplesegnet_resnet50.step-20000.tlt"
# https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_instance_segmentation
learning_rate_steps: "[60000, 80000, 100000]"
learning_rate_decay_levels: "[0.1, 0.02, 0.002]"
total_steps: 5000
train_batch_size: 2
eval_batch_size: 4
num_steps_per_eval: 1000
momentum: 0.9
l2_weight_decay: 0.0001
l1_weight_decay: 0.0
warmup_learning_rate: 0.0001
init_learning_rate: 0.02
num_examples_per_epoch: 10

data_config{
        image_size: "(576, 960)"
        augment_input_data: True
        eval_samples: 100
        training_file_pattern: "/workspace/data/tfrecords/annotations*.tfrecord"
        validation_file_pattern: "/workspace/data/tfrecords/annotations*.tfrecord"
        val_json_file: "/workspace/data/annotations.json"

        # dataset specific parameters
        num_classes: 2
        skip_crowd_during_training: True
        max_num_instances: 200
}
# Copied from PeopleSegNet Catalog
# https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/peoplesegnet
maskrcnn_config {
  nlayers: 50
  arch: "resnet"
  gt_mask_size: 112
  freeze_blocks: "[0]"
  freeze_bn: True
  # Region Proposal Network
  rpn_positive_overlap: 0.7
  rpn_negative_overlap: 0.3
  rpn_batch_size_per_im: 256
  rpn_fg_fraction: 0.5
  rpn_min_size: 0.

  # Proposal layer.
  batch_size_per_im: 512
  fg_fraction: 0.25
  fg_thresh: 0.5
  bg_thresh_hi: 0.5
  bg_thresh_lo: 0.

  # Faster-RCNN heads.
  fast_rcnn_mlp_head_dim: 1024
  bbox_reg_weights: "(10., 10., 5., 5.)"

  # Mask-RCNN heads.
  include_mask: True
  mrcnn_resolution: 28

  # training
  train_rpn_pre_nms_topn: 2000
  train_rpn_post_nms_topn: 1000
  train_rpn_nms_threshold: 0.7

  # evaluation
  test_detections_per_image: 100
  test_nms: 0.5
  test_rpn_pre_nms_topn: 1000
  test_rpn_post_nms_topn: 1000
  test_rpn_nms_thresh: 0.7

  # model architecture
  min_level: 2
  max_level: 6
  num_scales: 1
  aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
  anchor_scale: 8

  # localization loss
  rpn_box_loss_weight: 1.0
  fast_rcnn_box_loss_weight: 1.0
  mrcnn_weight_loss_mask: 1.0
}

Hello.

I’m making my own custom model by using TAO Toolkit.
I’d like to create my model based on PeopleSegNet which is mask_rcnn model.

Though I was able to run my first training, i have some quesionts on TAO Toolkit.

Here’s what I’ve done by now.

  1. Masking my dataset pictures with labelme. Currently I only have 10 pics.
  2. Convert annotation files to tfrecords.
  3. Create model spec file.

And my questions are

  1. Do i need to prepare different pics for traning and validation? I’m using same TRRecords for both training and validation right now.
  2. How should i decide total_steps, train_batch_size and eval_batch_size in model spec file? I read docs but didn’t understand.
  3. Basically, how many times do i train model? (After my first traing, the model was able to mask dataset pics correctly. I’ve not tested the model with other pics yet.)

I was previously working on Jetson devices, therefore I know some basic knowleges on model deployment. However, I’m new to this model training field.

Best regards

  1. It is ok to set the same tfreocrds for training and validation.
  2. Refer to MaskRCNN — TAO Toolkit 4.0 documentation

The num_examples_per_epoch parameter must be specified, and either num_epochs or total_steps should be specified. If both are set, total_steps will be overwritten by num_epochs * num_examples_per_epoch / train_batch_size

eval_batch_size is the the batch size during validation or evaluation.

  1. If you mean the how long it is needed for training, it is hard to say because it depends on training images, training class, pretrained-model, etc.