High train and val loss using TAO toolkit on KITTI dataset

Hi all,

I’m training a Yolov4 detection model with a Resnet50 backbone using the TAO toolkit on the KITTI dataset. I’m using the KITTI sequence format and 000000.txt (87 Bytes) is one of the label files. The spec file I’ve used is this
yolov4_resnet.txt (2.5 KB)

one. The problem is that I got a very large training loss in starting of the experiment ~10000 and the loss has been decreasing with every epoch.
For example this was my loss on the 9th epoch

Epoch 9/5000
842/842 [==============================] - 412s 489ms/step - loss: 3907.2267
Epoch 10/5000
842/842 [==============================] - 368s 437ms/step - loss: 2914.5450

Now on 147th epoch the loss isn’t decreasing and is at

Epoch 147/5000
842/842 [==============================] - 326s 387ms/step - loss: 51.7914
Epoch 148/5000
842/842 [==============================] - 347s 412ms/step - loss: 51.5345

The AP for each class is also good

Start to calculate AP for each class
*******************************
car           AP    0.90711
cyclist       AP    0.88406
pedestrian    AP    0.80276
              mAP   0.86464
*******************************

My concern is why the loss is still this high even though the AP for each of the classes is coming out to be very good. Is the high training loss normal for the KITTI dataset or is there some issue with my training setup?

• Hardware (T4/V100/Xavier/Nano/etc) RTX 3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Yolov4 + Resnet 50
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v3.21.11-tf1.15.5-py3
• Training spec file(If have, please share here) - Shared above

It is normal get see high loss in the beginning.

I’ve been training the model for approx 200 epochs and the loss is still at 44~50. Which is more than I’ve ever seen for any of the models. Is it normal for the KITTI dataset?
This is the current result.

Epoch 233/5000
842/842 [==============================] - 315s 374ms/step - loss: 45.3001
Epoch 234/5000
842/842 [==============================] - 365s 434ms/step - loss: 44.3534
Producing predictions: 100% 47/47 [00:24<00:00,  1.95it/s]
Start to calculate AP for each class
*******************************
car           AP    0.90797
cyclist       AP    0.90088
pedestrian    AP    0.83554
              mAP   0.88146
*******************************
Validation loss: 23.49301586049126

It is normal for yolov4 training on KITTI dataset.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.