I’ve tried training RT-DETR using EfficientViT_B1 as the backbone, but I haven’t been able to succeed. I attempted to use the files from this link:
Unfortunately, I was unable to get it working. I received an error message indicating that the mapping is incorrect. I’m unsure whether these models are compatible with the .pt format, even after converting to .pth. It’s also possible that there’s a configuration issue in training file.
I was able to train the RT-DETR with the following backbones: ResNet50, ResNet18, and convnextv2_large
Can you train efficientvit_b1 successfully without any pretrained_backbone_path?
The link you shared is not an official model from TAO. So, it may not be able to train in TAO due to mapping.
I trained the model without pretrained_backbone_path. It works. However, the accuracy is lower than using Resnet18 (with a pretrained_backbone_path). Do you have a link to access the official Efficientvit_bx models from TAO?
Hi,
EfficientViT-b(x) series would be a strong model candidate to consider for RT-DETR (in terms of accuracy and latency).
TAO doesn’t have pretrained commercial weights for effiicientViT released yet.
There are several model candidates that were released as part of the EfficientViT release, which you can pick from CVHub520/efficientvit: EfficientViT is a new family of vision models for efficient high-resolution vision. . We are compatible with the model defined there and hence the weights as well. You may try to use EfficientViT-L2 and double check. Thanks.