TLDR: Should I train a Yolo4, RT-DETR, or DINO for object detection, given that I don’t have a powerful GPU?
I needed to make a new model, since the old Caffemodel we ran on the OG Jetson Nano does not work on my Jetson Xavier, as it uses a newer version of Deepstream.
I have about 5000 training images in KITTI format, and only one class of object to detect.
I was able to install TAO CLI and install the TAO tutorials. So first, I started the setup for training a Yolo4 model, but I didn’t notice it was deprecated until I couldn’t find the backbone to download from NGC. (I now know that the backbones are here)
From what I understand, based on the migration documentation for TOA 5.5 to 6.0, YOLO and a whole host of other models have been deprecated in favor of RT-DETR.
Now, there wasn’t any tutorial for RT-DETR in the TAO tutorials repo that I could find, but I imagine I should be able to get it done once I figure out where the pretrained_model / backbones are. I didn’t see anything that I thought might be them when using ngc registry model list. But there was documentation for training RT-DETR here.
There are Tutorials for DINO in the repo, so I assume it would not be a problem to get it to work. My only concern is that I don’t have a powerful enough GPU available, so I would be sitting around for a couple of weeks while it trains.
So, should I ignore that Yolo is deprecated and see if I can get it to work by downloading the backbone manually without NGC CLI? Should I try using RT-DETR, and if so, does anyone know where to download the backbone / pretrained model? Is using DINO feasible, or should I not try, given the hardware I have available?
From what I understand, all the models should be able to run on DeepStream. Thought, do correct me if I am wrong?
• Dev: Nvidia RTX 2080 Super / TAO 6.25
• Deploy: Hardware Jetson Xavier / Deepstream 6.3
• Network Type Yolo_v4 / RT-DETR / DINO