My training dataset contains around ~5200 images.
This is an ongoing training at the time of asking the question. The training loss after the 1st epoch is 1286807.1, and at 2nd epoch this goes down to 693339.94, on the 60th epoch the training loss value is 633260.7 and the mAP is at 0.25.
My question is why is the loss value so high and why does it stay high. I am using the openimages pretrained weight’s file from NGC, this should ensure transfer learning and an increase in the mAP and decrease in the loss in the initial epochs itself. Why is this not the case? for ref, I had previously trained a TAO yolov3 model for 80 epochs using the same dataset and the same pretrained weights and the training loss was under 20.0 after the first epoch, the mAP was above 0.25 after the 10th epoch and got a final training loss value under 1.0 and a mAP ~0.6.
Is this something to do with the architecture difference b/w both the models and does yolov4 require a larger batch size and a larger dataset inorder to achieve these results?
I can see that tao toolkit 5.0 is released, do you recommend this version. Also the latest version in version 4 is 4.0.2, if I have to use version 4 does using the above version make sense or should I go with 4.0.1.
Yes, you can refer to the specs in TAO Toolkit Getting Started | NVIDIA NGC wget --content-disposition 'https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-getting-started/versions/4.0.2/files/notebooks/tao_launcher_starter_kit/yolo_v4/specs/yolo_v4_train_resnet18_kitti.txt'