Very low evaluation results for dino model by dino.ipynb in tao-getting-started_v5.3

I tried to run the dino.ipynb in tao-getting-started_v5.3, and after 12 epochs training, the evaluation results is very very low, what’s the potential reason for it?

Evaluation results:
Evaluation metrics generated.
Testing DataLoader 0: 100%|███████████████████| 625/625 [12:26<00:00, 1.19s/it]
┃ Test metric ┃ DataLoader 0 ┃
│ test_class_error │ 69.84188842773438 │
│ test_loss │ 25.781869888305664 │
│ test_loss_bbox │ 0.16190668940544128 │
│ test_loss_ce │ 0.7826069593429565 │
│ test_loss_giou │ 0.6508875489234924 │
│ test_mAP │ 0.0003703692060464326 │
│ test_mAP50 │ 0.0007101921440162097 │
Evaluation finished successfully

Do you mean you run with default notebook and also the default dataset mentioned in the notebook? Any change on your side? Also, did you use the pretrained model?

Yes, I run the default notebook and also the default dataset. I have no change except setting the “LOCAL_PROJECT_DIR”.

And I used the pretrained model: fan_small_hybrid_nvimagenet.pth

How about the training log? Could you upload it with button

Here is a brief log, is it OK for analyzing?
status.json.txt (8.9 KB)

I find in the log there is OOM in the first run. How did you change to make it work for the 2nd run?

Also, from your log, the loss is decreasing but it is very slow.
How many gpu did you use and what gpu did you use?

I decrease the batch size from 4 to 2, and I have only one 3090 GPU with 24G memory.

What is the typical loss decreasing rate?

For a standard dataset like COCO, please try to follow the guide mentioned in DINO - NVIDIA Docs.

Optimize GPU Memory
There are various ways to optimize GPU memory usage. One obvious trick is to reduce dataset.batch_size. However, this can cause your training to take longer than usual. Hence, we recommend setting below configurations in order to optimize GPU consumption.

Set train.precision to fp16 to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.

Set train.activation_checkpoint to True to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.

Set train.distributed_strategy to ddp_sharded to enabled Sharded DDP training. This will share gradient calculation across different processes to help reduce GPU memory.

Try using more lightweight backbones like fan_tiny or freeze the backbone through setting model.train_backbone to False.

Try changing the augmentation resolution in dataset.augmentation depending on your dataset.

Thank you for your information, I will try it.

I need to increase the num_epochs for better training results?

You can follow several items mentioned above firstly.
After done, you can trigger more experiments, such as changing num_feature_levels mentioned in notebook, increasing training epochs, etc.

I have one more question for dino model.

In the “getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/notebooks/tao_launcher_starter_kit/dino”
The classmap.t.xt lists 80 classes, but the spec file define the num_classes as 91.

Is it a problem? And is it a possible reason for the low performance?

Please refer to DINO - NVIDIA Docs.
The category_id from your COCO JSON file should start from 1 because 0 is set as a background class. In addition, dataset.num_classes should be set to max class_id + 1 . For instance, even though there are only 80 classes used in COCO, the largest class_id is 90, so dataset.num_classes should be set to 91.

More, as mentioned in tao_tutorials/notebooks/tao_launcher_starter_kit/dino/dino.ipynb at main · NVIDIA/tao_tutorials · GitHub, “For this demonstration, we changed the architectures from the original implementation so that the training can be completed faster”, so please set num_queries back to 900 .
Please share the training log once you have.