Hi all,
I’m training a Yolov4 detection model with a Resnet50 backbone using the TAO toolkit on the KITTI dataset. I’m using the KITTI sequence format and 000000.txt (87 Bytes) is one of the label files. The spec file I’ve used is this
yolov4_resnet.txt (2.5 KB)
one. The problem is that I got a very large training loss in starting of the experiment ~10000 and the loss has been decreasing with every epoch.
For example this was my loss on the 9th epoch
Epoch 9/5000
842/842 [==============================] - 412s 489ms/step - loss: 3907.2267
Epoch 10/5000
842/842 [==============================] - 368s 437ms/step - loss: 2914.5450
Now on 147th epoch the loss isn’t decreasing and is at
Epoch 147/5000
842/842 [==============================] - 326s 387ms/step - loss: 51.7914
Epoch 148/5000
842/842 [==============================] - 347s 412ms/step - loss: 51.5345
The AP for each class is also good
Start to calculate AP for each class
*******************************
car AP 0.90711
cyclist AP 0.88406
pedestrian AP 0.80276
mAP 0.86464
*******************************
My concern is why the loss is still this high even though the AP for each of the classes is coming out to be very good. Is the high training loss normal for the KITTI dataset or is there some issue with my training setup?
• Hardware (T4/V100/Xavier/Nano/etc) RTX 3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Yolov4 + Resnet 50
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v3.21.11-tf1.15.5-py3
• Training spec file(If have, please share here) - Shared above’