Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) GeForce RTX3050
• Network Type Mask_rcnn
• TLT Version
dockers:
nvidia/tao/tao-toolkit:
4.0.0-tf2.9.1:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
4.0.0-tf1.15.5:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification_tf1
4. detectnet_v2
5. dssd
6. emotionnet
7. efficientdet_tf1
8. faster_rcnn
9. fpenet
10. gazenet
11. gesturenet
12. heartratenet
13. lprnet
14. mask_rcnn
15. multitask_classification
16. retinanet
17. ssd
18. unet
19. yolo_v3
20. yolo_v4
21. yolo_v4_tiny
22. converter
4.0.0-pyt:
docker_registry: nvcr.io
tasks:
1. action_recognition
2. deformable_detr
3. segformer
4. re_identification
5. pointpillars
6. pose_classification
7. n_gram
8. speech_to_text
9. speech_to_text_citrinet
10. speech_to_text_conformer
11. spectro_gen
12. vocoder
13. text_classification
14. question_answering
15. token_classification
16. intent_slot_classification
17. punctuation_and_capitalization
format_version: 2.0
toolkit_version: 4.0.0
published_date: 12/08/2022
• Training spec file
# https://docs.nvidia.com/tao/tao-toolkit/text/instance_segmentation/mask_rcnn.html#id6
seed: 123
use_amp: False
warmup_steps: 0
checkpoint: "/workspace/models/peoplesegnet_vtrainable_v2.1/peoplesegnet_resnet50.step-20000.tlt"
# https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_instance_segmentation
learning_rate_steps: "[60000, 80000, 100000]"
learning_rate_decay_levels: "[0.1, 0.02, 0.002]"
total_steps: 5000
train_batch_size: 2
eval_batch_size: 4
num_steps_per_eval: 1000
momentum: 0.9
l2_weight_decay: 0.0001
l1_weight_decay: 0.0
warmup_learning_rate: 0.0001
init_learning_rate: 0.02
num_examples_per_epoch: 10
data_config{
image_size: "(576, 960)"
augment_input_data: True
eval_samples: 100
training_file_pattern: "/workspace/data/tfrecords/annotations*.tfrecord"
validation_file_pattern: "/workspace/data/tfrecords/annotations*.tfrecord"
val_json_file: "/workspace/data/annotations.json"
# dataset specific parameters
num_classes: 2
skip_crowd_during_training: True
max_num_instances: 200
}
# Copied from PeopleSegNet Catalog
# https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/peoplesegnet
maskrcnn_config {
nlayers: 50
arch: "resnet"
gt_mask_size: 112
freeze_blocks: "[0]"
freeze_bn: True
# Region Proposal Network
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_min_size: 0.
# Proposal layer.
batch_size_per_im: 512
fg_fraction: 0.25
fg_thresh: 0.5
bg_thresh_hi: 0.5
bg_thresh_lo: 0.
# Faster-RCNN heads.
fast_rcnn_mlp_head_dim: 1024
bbox_reg_weights: "(10., 10., 5., 5.)"
# Mask-RCNN heads.
include_mask: True
mrcnn_resolution: 28
# training
train_rpn_pre_nms_topn: 2000
train_rpn_post_nms_topn: 1000
train_rpn_nms_threshold: 0.7
# evaluation
test_detections_per_image: 100
test_nms: 0.5
test_rpn_pre_nms_topn: 1000
test_rpn_post_nms_topn: 1000
test_rpn_nms_thresh: 0.7
# model architecture
min_level: 2
max_level: 6
num_scales: 1
aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
anchor_scale: 8
# localization loss
rpn_box_loss_weight: 1.0
fast_rcnn_box_loss_weight: 1.0
mrcnn_weight_loss_mask: 1.0
}
Hello.
I’m making my own custom model by using TAO Toolkit.
I’d like to create my model based on PeopleSegNet which is mask_rcnn model.
Though I was able to run my first training, i have some quesionts on TAO Toolkit.
Here’s what I’ve done by now.
- Masking my dataset pictures with labelme. Currently I only have 10 pics.
- Convert annotation files to tfrecords.
- Create model spec file.
And my questions are
- Do i need to prepare different pics for traning and validation? I’m using same TRRecords for both training and validation right now.
- How should i decide total_steps, train_batch_size and eval_batch_size in model spec file? I read docs but didn’t understand.
- Basically, how many times do i train model? (After my first traing, the model was able to mask dataset pics correctly. I’ve not tested the model with other pics yet.)
I was previously working on Jetson devices, therefore I know some basic knowleges on model deployment. However, I’m new to this model training field.
Best regards