Thanks for providing the RT-DETR implementation in TAO.
I’ve been experimenting with RT-DETR on our internal people dataset to compare its performance with YOLOv4 (also trained using TAO v5). After training, YOLOv4 reaches an AP50 of 87.5%, while RT-DETR only achieves an AP50 of 76% which was surprising. That’s a 10% gap, which is quite significant considering RT-DETR is expected to outperform the most recent YOLO models according to the original paper.
For reference, I’m using the ResNet-50 backbone from NGC (ImageNet pretrained), with the default TAO RT-DETR configuration.
To better understand whether the issue lies with our dataset or something else, I attempted to reproduce the authors results on COCO 2017. I carefully followed the official configuration from the TAO GitHub repo rtdetr_r50vd.yml, using 6 decoder layers as described. The original paper reports AP = 52.2% and AP50 = 70.6%, but after training for 72 epochs in TAO, I’m only seeing AP = 32.4% and AP50 = 47.3%.
This large discrepancy (~20% lower) on COCO suggests the issue may not be dataset-specific. Could there be differences in the TAO implementation compared to the original paper?
Has the TAO team been able to reproduce the paper’s results internally before releasing this model? Any guidance or suggestions would be greatly appreciated.
This is not a github from TAO. May I know the yaml file you are training TAO RT-DETR?
The pretrained model needs to change. Use ResNet50_vd (ImageNet-22K distilled) + RT-DETR, the COCO val mAP is about 53.1%. The COCO val mAP50 is about 71.3%.
Yes it is the offical RT-DETR repo. I just shared the link to their config to show you how I validated my parameters in my TAO config.
The pretrained model needs to change. Use ResNet50_vd (ImageNet-22K distilled)
Can you share the link to that model please?. I carefully looked through NGC (logged in) everywhere imagine or Resnet was mentioned and could not find that model.
I’ve did some training with ConvNextv2 backbone and it does outperform Resnet50 on my dataset.
I still would like to try Resnet50-vd to compare performance as this is to deploy on edge device.
Let me know when you have an update regarding that matter.
For the moment, can you also try training the model with the official Torchvision RN50 weights and re-run the training. Compared to RN50vd it should be about 3% lower, but not 20% or so. I am not sure if the model you use follow the same naming convention as the ones defined in TAO, so the state-dict may not be mapped correctly to load the weights.
I’ve tested the resnet backbone from PyTorch and I am indeed getting really similar accuracy than convnext2 .
I haven’t retrained on COCO because it takes too much time but on my personal dataset, RT-DETR with convnext2-nano and resnet50_pytorch backbone are reaching a similar map@50 of 88% while resnet50_nvidia is only at 75%
The resnet50_nvidia means ResNet-50 backbone from NGC (ImageNet pretrained) ? It is not an official release from TAO because it is not trained from TAO team. So, it is not suggested to use in TAO training.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.