Detectnet_v2 resnet50 network training peaks in the first 20 epochs

pddarrell · March 5, 2023, 5:37pm

Please provide the following information when requesting support.

• Hardware (RTX 3080)
• Network Type (Detectnet_v2)
• TLT Version v3.22.05-tf1.15.5-py3
.
detectnet_v2_train_resnet50_kitti.txt (4.2 KB)

Hi,

I have run numerous experiments using a 2 class custom dataset and have fine-tuned the maximum learning rate, soft start and coverage threshold. In all experiments the training achieves the best results in 20 epochs at most and extending the training beyond this causes a lower performance. I have also found that starting annealing as late as possible (0.95) keeps the performance higher throughout training.

This seems to run contrary to examples in the documentation, where training experiments of 60 - 100 epochs are mentioned.

Do you have any suggestions as to how I can extend training and improve the mAP. I currently have a best result of 92.6% and this was achieved on the 20th (final) epoch. An earlier experiment achieved around 70% mAP at 30 epochs and by 80 epochs it was down to around 55.6%. \this is what prompted me to focus in on the earlier epochs.

Morganh · March 6, 2023, 6:01am

Can you share full training log?

pddarrell · March 6, 2023, 5:07pm

detectnet_v2_train_resnet50_kitti.txt (4.2 KB)

I am not currently using pruning, hence L2 regularization.
I tried an image size of of 1216 x 480 as well, it does not have any significant effect.

Morganh · March 7, 2023, 2:30am

Besides the spec file, could you share full training log?

pddarrell · March 7, 2023, 2:46pm

I had to rerun the training, but it gave a pretty similar result (91.3% at epoch 19 this time vs. 92.6% at epoch 20 last time).
detectnet_v2_resnet50-20 epoch.pdf (215.0 KB)

Morganh · March 7, 2023, 2:59pm

I think you are running the experiments which is similar to your old topic Detectnet_v2(resnet50) low accuracy on 2 class dataset. For your training images and labels, I remember that you label the whole images as damage or healthy. If that is the case, actually you can train with classification network instead of detection network.
If you decide the use detection network, I suggest you to label to the “damage” class where the area is really damage. And label to the “healthy” class where is really healthy.

pddarrell · March 7, 2023, 4:31pm

Some time ago I tried running some experiments with data labelled for classification, i.e the whole frame. The results were terrible.

Blockquote

I attach two examples of my data and labelling. I think this is already what you are suggesting, isn’t it?
“damage”

100349.txt (80 Bytes)

“healthy”

000016.txt (76 Bytes)

Please confirm that I have this correct.

Morganh · March 8, 2023, 3:11am

Your label is
damage 0.00 0 0.00 2184.0 372.0 3174.0 1005.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00

But your image is 1920x746.

Could you double check if you upload the correct image?

pddarrell · March 8, 2023, 8:54am

We had another exchange like this recently.
My guess is that the forum API downsizes larger images to HD - based pixel dimensions, i.e 1920 x * :

The full frame dimensions for both are 3648 x 1417 and the ‘damage’ label is a smaller rectangle contained within 100349.jpg.

Morganh · March 9, 2023, 4:49am

For using detection network, I suggest you to label damage area in an image.
For example, to label below red area.

To run training to detect only one class. The model is to find any area which is damage.

pddarrell · March 9, 2023, 9:21am

That is exactly what I have already done:

The label confirms this:
100349.txt (80 Bytes)

Blockquote To run training to detect only one class. The model is to find any area which is damage.

So, do I simply delete all mention of the “healthy” class from
“detectnet_v2_train_resnet50_kitti.txt” and rerun training?

Morganh · March 9, 2023, 9:25am

Yes, you can. More, if one image has not damage area, please delete this file.

pddarrell · March 10, 2023, 11:05am

Okay, thanks for that, that means ‘deleting’ half the database.
I will run more experiments.
I can confirm that the first experiment produced a better mAP for damage. The main aim will be for performance to improve over more epochs.

pddarrell · March 14, 2023, 5:15pm

I am following up on this to say that through a combination of the larger image size of 1216 x 480 pixels and slightly reducing the soft start of the training_config from the example given in the documentation, a 50 epoch experiment yielded results of 95.1 mAP at 14 epoch, 93.3 at 30 epoch and 93.1 at 43 epoch.

Experiments of 60 epochs have so far yielded nothing better than 92.4, though this was achieved at 50/60 epoch. I guess this is a solution, just not quite as definitive as one might have hoped for.

Many thanks for your help.

system · March 28, 2023, 5:15pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Detectnet_v2(resnet50) low accuracy on 2 class dataset TAO Toolkit	25	1237	February 12, 2023
Getting 0 mAP for detectnet_v2 model over 150 epochs TAO Toolkit	14	271	January 11, 2025
mAP training several classes = 0.0 and so low with data custom using detectnet_v2 (resnet_18)) TAO Toolkit	33	865	February 1, 2024
Detectnet_v2 model for custom dataset get low map TAO Toolkit	8	1091	October 12, 2021
Detectnet v2 training :: very low or zero precision TAO Toolkit	4	739	April 17, 2023
0 map over 120 epoch on detectnet v2 pre-trained model TAO Toolkit	5	858	October 12, 2021
Learning rate scheduler for detectnet_v2 TAO Toolkit	13	1763	October 12, 2021
Too many false positive in custom training (Detect-net v2 + Resnet 18)? TAO Toolkit jetson-inference	18	1028	August 20, 2021
TLT-Train DetectNetv2 ResNet18 always give mAP 0% at target class TAO Toolkit	11	651	October 26, 2022
[DIGITS] Retrain custom classes on DetectNet, model doesn't converge. Deep Learning (Training & Inference)	0	534	June 20, 2018

Detectnet_v2 resnet50 network training peaks in the first 20 epochs

Related topics