Detectnet_v2 resnet50 network training peaks in the first 20 epochs

Please provide the following information when requesting support.

• Hardware (RTX 3080)
• Network Type (Detectnet_v2)
• TLT Version v3.22.05-tf1.15.5-py3
detectnet_v2_train_resnet50_kitti.txt (4.2 KB)


I have run numerous experiments using a 2 class custom dataset and have fine-tuned the maximum learning rate, soft start and coverage threshold. In all experiments the training achieves the best results in 20 epochs at most and extending the training beyond this causes a lower performance. I have also found that starting annealing as late as possible (0.95) keeps the performance higher throughout training.

This seems to run contrary to examples in the documentation, where training experiments of 60 - 100 epochs are mentioned.

Do you have any suggestions as to how I can extend training and improve the mAP. I currently have a best result of 92.6% and this was achieved on the 20th (final) epoch. An earlier experiment achieved around 70% mAP at 30 epochs and by 80 epochs it was down to around 55.6%. \this is what prompted me to focus in on the earlier epochs.

Can you share full training log?

detectnet_v2_train_resnet50_kitti.txt (4.2 KB)

I am not currently using pruning, hence L2 regularization.
I tried an image size of of 1216 x 480 as well, it does not have any significant effect.

Besides the spec file, could you share full training log?

I had to rerun the training, but it gave a pretty similar result (91.3% at epoch 19 this time vs. 92.6% at epoch 20 last time).
detectnet_v2_resnet50-20 epoch.pdf (215.0 KB)

I think you are running the experiments which is similar to your old topic Detectnet_v2(resnet50) low accuracy on 2 class dataset. For your training images and labels, I remember that you label the whole images as damage or healthy. If that is the case, actually you can train with classification network instead of detection network.
If you decide the use detection network, I suggest you to label to the “damage” class where the area is really damage. And label to the “healthy” class where is really healthy.

Some time ago I tried running some experiments with data labelled for classification, i.e the whole frame. The results were terrible.


I attach two examples of my data and labelling. I think this is already what you are suggesting, isn’t it?

100349.txt (80 Bytes)


000016.txt (76 Bytes)

Please confirm that I have this correct.

Your label is
damage 0.00 0 0.00 2184.0 372.0 3174.0 1005.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00

But your image is 1920x746.

Could you double check if you upload the correct image?

We had another exchange like this recently.
My guess is that the forum API downsizes larger images to HD - based pixel dimensions, i.e 1920 x * :

The full frame dimensions for both are 3648 x 1417 and the ‘damage’ label is a smaller rectangle contained within 100349.jpg.

For using detection network, I suggest you to label damage area in an image.
For example, to label below red area.

To run training to detect only one class. The model is to find any area which is damage.

That is exactly what I have already done:

The label confirms this:
100349.txt (80 Bytes)

Blockquote To run training to detect only one class. The model is to find any area which is damage.

So, do I simply delete all mention of the “healthy” class from
“detectnet_v2_train_resnet50_kitti.txt” and rerun training?

Yes, you can. More, if one image has not damage area, please delete this file.

Okay, thanks for that, that means ‘deleting’ half the database.
I will run more experiments.
I can confirm that the first experiment produced a better mAP for damage. The main aim will be for performance to improve over more epochs.

I am following up on this to say that through a combination of the larger image size of 1216 x 480 pixels and slightly reducing the soft start of the training_config from the example given in the documentation, a 50 epoch experiment yielded results of 95.1 mAP at 14 epoch, 93.3 at 30 epoch and 93.1 at 43 epoch.

Experiments of 60 epochs have so far yielded nothing better than 92.4, though this was achieved at 50/60 epoch. I guess this is a solution, just not quite as definitive as one might have hoped for.

Many thanks for your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.