Errors in Training, 0 or Nan mAP, Low Loss, Tutorial Config

Complete KITTI for Caltech Birds Dataset, padded.

kitti_complete_list.txt (757.1 KB)

EDIT: reuploaded complete list.

Can you share the exact address for downloading the training image?
I want to reproduce your issue on my side.

@Morganh
http://www.vision.caltech.edu/visipedia/CUB-200-2011.html

Thanks. Did you download this tar file? https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view

That is the one I am using.

Thanks for the info.

1 Like

You can try a lower batch-size. To see if it can help.

1 Like

First Epoch:
(batch size 1)

Epoch 1/80

Validation cost: 0.001583
Mean average_precision (in %): 0.3687

class name average precision (in %)


bird 0.368713

Higher batch size:

Validation cost: 0.002070
Mean average_precision (in %): 0.0090

class name average precision (in %)


bird 0.00901658

very slow and 0 mAP, batch size 1

Epoch 21/80

Validation cost: 0.000940
Mean average_precision (in %): 0.0000

class name average precision (in %)


bird 0

Median Inference Time: 0.040929
2021-01-17 09:50:21,505 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 21/80: loss: 0.27364 Time taken: 0:18:25.144697 ETA: 18:06:43.537148

I can reproduce the mAP 0 . Seems that the loss does not converge.
Will trigger more experiments for find what is going wrong.

1 Like

@Morganh

Thanks Morganh. Looking forward to learning what you find.

Please use below spec to train.
It runs with resnet18 backbone, bs4 and totally 120 epochs.Train_bird.txt (3.1 KB)
It can fix your issue.

When it runs at 30th epoch, the mAP is already 85%.

In the end, the training result is 92.9396%.

Epoch 120/120

Validation cost: 0.000202
Mean average_precision (in %): 92.9396

class name average precision (in %)


bird 92.9396

1 Like

@Morganh

I started training with the spec you linked. I will post results when I have finished.

/task_progress_monitor_hook.pyc: Epoch 0/120: loss: 0.10980 Time taken: 0:00:00 ETA: 0:00:00

Final Output of Training:

Epoch 120/120

Validation cost: 0.000215
Mean average_precision (in %): 91.2699

class name average precision (in %)


bird 91.2699

Median Inference Time: 0.010165

@Morganh

Here is a sample Deepstream output of the Detectnet_v2 on Resnet 18:

Another sample: