This project aims to achieve real-time, high-precision object detection on Edge GPUs, such as the Jetson Nano. By leveraging the power of edge GPUs, YOLO-ReT can provide accurate object detection in real-time, making it suitable for a variety of applications, such as surveillance, autonomous driving, and robotics. In this project, we challenge several notions of model design and evaluation represented in the literature. In this way, we hope to push the boundaries of what is currently possible for object detection on edge GPUs.
Our central contribution is the introduction of two new architectural changes: Backbone Truncation and Multi-Scale Feature Interaction, which together improve both the accuracy and latency of existing detection models.
Backbone Truncation: In the literature, transfer learning backbone has often been adapted as a whole, without considering that not every layer is equally important for transfer learning. We show that the last layers of a classification backbone are not only extremely bulky, but are also task-specific and do not provide useful transfer learning information for object detection. Instead, we propose to truncate the classification backbone and use only the rest as a feature extractor. In this way, we can significantly reduce the size of the model without sacrificing accuracy.
Transfer learning curve of various backbones. The detection accuracy is the best when some of the final layers are not initialized with transfer learning weights!
Multi-Scale Feature Interaction: Existing methods for multi-scale feature interaction can be divided into a combination of top-down and bottom-up approaches, each focusing on only two adjacent feature scales. This neglects a large number of possible combinatorial pairs and makes the propagation of information between distant feature scales inefficient. Inspired by the linking of non-adjacent feature scales in NAS-FPN, we propose a lightweight raw feature collection and redistribution (RFCR) module that combines raw multi-scale features from the backbone and then redistributes them back to each feature scale. Thus, each scale’s feature maps now contain direct links from all other scales.
RFCR Module: Simplistic design with minimal network fragmentation for minimal latency decrease, while providing direct connections between non-adjacent scales.
Moreover, in our evaluation, we compare the latency of our proposed model on actual devices, rather than relying only on model size or FLOPs as an indicator of performance. This allows us to better understand the real-world applicability of our model and make more informed decisions about its use. We compare the runtime FPS of various models on Jetson Nano, Jetson Xavier NX and Jetson AGX Xavier.
Please find all the details of our method and the results in our paper (Published at WACV 2022) - WACV 2022 Open Access Repository
Please find the code to train and test YOLO-ReT models on our repo - GitHub - prakharg24/yoloret: Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'
Feel free to send us an email, or raise an issue in the repo, if you have any questions regarding our work.