I’m hoping to create a really performant object detection training pipeline using TAO, so I’ve selected Dino as my model of choice, but reading through the research papers and experimenting with the model itself, the training process seems to be very unstable despite playing around with the hyperparameters.
Has this been the general experience by the TAO experts? If so, what is the next best model on the platform that I can use to provide similar performance with a more stable training process?
The training process is also quite slow (fan_small_hybrid_nvimagenet + Dino), so a lighter backbone+model combination recommendation with proven comparable performance would be appreciated as well.
Suggest to use RT-DETR. If your priority is similar DETR-family accuracy with a more stable and faster training experience on TAO, the best next option to try is RT-DETR.
DINO is a strong but relatively heavy Transformer-based detector, so training can be slow and it can also be sensitive to hyperparameters, batch size, and data/annotation quality (especially when fine-tuning with limited compute).
Both RT-DETR and Deformable DETR are available in TAO, RT-DETR is the better choice when you want faster training/inference. Deformable DETR is usually a good option when you can afford more compute and want strong accuracy/robustness, especially on more challenging datasets, but it can be heavier and slower depending on the configuration.