Please provide the following information when requesting support.
• Hardware L40s
• Network Type Grounding DINO
• TLT Version TAO 5.5
• Training spec file
train:
num_gpus: 8
num_nodes: 1
validation_interval: 1
optim:
lr_backbone: 2e-05
lr: 0.0002
lr_steps: [30,80]
momentum: 0.9
lr_linear_proj_mult: 0.1
num_epochs: 100
freeze: ["backbone.0", "bert"] # if only finetuning
precision: bf16
pretrained_model_path: /code/TAO/grounding_dino_vgrounding_dino_swin_tiny_commercial_trainable_v1.0/grounding_dino_swin_tiny_commercial_trainable.pth
dataset:
train_data_sources:
- image_dir: //grounding_object_dataset/data/pinyin/train
json_file: //grounding_object_dataset/tao_format/pinyin/train_odvg.jsonl
label_map: //grounding_object_dataset/tao_format/pinyin/train_odvg_labelmap.json
val_data_sources:
image_dir: //grounding_object_dataset/data/pinyin/val
json_file: //grounding_object_dataset/tao_format/pinyin/val_remapped.json
max_labels: 120
batch_size: 8
workers: 8
dataset_type: serialized # To reduce the system memory usage
augmentation:
scales:
- 480
- 512
- 544
- 576
- 608
- 640
- 672
- 704
- 736
- 768
- 800
input_mean:
- 0.485
- 0.456
- 0.406
input_std:
- 0.229
- 0.224
- 0.225
train_random_resize:
- 480
- 512
- 544
- 576
- 608
- 640
- 672
- 704
- 736
- 768
- 800
- 1024
horizontal_flip_prob: 0.0
train_random_crop_min: 384
train_random_crop_max: 600
random_resize_max_size: 1024
test_random_resize: 1024
fixed_padding: true
fixed_random_crop: null
model:
backbone: swin_tiny_224_1k
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 1500
dropout_ratio: 0.0
dim_feedforward: 2048
log_scale: auto
class_embed_bias: True
num_select: 1500
dn_number: 0
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
While training on my own dataset, I found that the training accuracy could not reach the normal level (I can achieve a map50 of 96 when using other GroundingDINO training frameworks).
However, when using TAO and training for 70 epochs, the map50 only reached 87. My dataset has three categories, and the data volume is 20,000.
Is this performance normal? Could it be that some parameters are not set correctly?
Thanks