Hi,
we are currently trying to reproduce a training we did on YOLO4 on the darknet repository (GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )) with the TAO toolkit. We already did some hyperparameter-tuning last week, but the training always topped out at around 83%. With the darknet repository, we were able to achieve 92%.
In my opinion, the gap looks too large to be attributed exclusively to hyperparameters and I am not sure where it comes from. What we tried on optimizing the hyperparameters was:
- Adapt the anchors to our custom dataset
- change optimizer (SGD, Adam)
- change learning rate
- change batch size
- change training epochs
- change regularizer weighting
- activate random input sizing (mAP curve became very fluctuating, but also reached 83%)
We took the spec file from the CV samples 1.4.1 as a base for our training.
yolo_v4_train_cspdarknet53 - Copy.txt (2.3 KB)
I am aware, that we are using the pre-trained backbone on the OpenImages dataset, and that it would be better to train the backbone on ImageNet, but I am not sure if this can explain the 10% difference. Maybe you guys have some insights on this.
Maybe you have some hints about what else we can try.
Also: Is there a roadmap or a timeline for TAO?
We are very looking forward that the BYOM feature will be extended to Object Detection.
Thanks in advance.
• Hardware: V100
• Network Type: Yolo_v4 with cspdarknet_53
• TLT Version: tao-toolkit-tf:v3.22.05-tf1.15.5-py3