Training does not converge well in classification training

edit_or · May 21, 2020, 2:18am

I am training two classes classification and training doesn’t converge very well.
Training data size is 17000 for each class.
How can I improve training?
training.log (2.0 KB) classification_retrain_spec.log (1.1 KB)

Morganh · May 21, 2020, 7:13am

Please consider decreasing batch-size.
Also, please consider changing to

lr_config {
#scheduler: “step”
#learning_rate: 0.006
#step_size: 10
#gamma: 0.1
scheduler: "soft_anneal"
learning_rate: 0.05

soft_start: 0.056
annealing_points: "0.3, 0.6, 0.8"
annealing_divider: 10
}

edit_or · May 21, 2020, 4:08pm

May I know what the differences are between the two configurations? The new configuration works very well for my data.

Morganh · May 22, 2020, 6:18am

For “step”, it implements the step learning rate annnealing schedule according to the progress of the training. The scheduler adjusts the learning rate of the experiment in steps at regular intervals.
The learning rate will reduces at every step.

For “soft_anneal”, this learning rate scheduler adjusts learning rate in the following phases:

     Phase 1: 0.0 <= progress < soft_start:
               Starting from start_lr linearly increase the learning rate to base_lr.
     Phase 2: at every annealing point, divide learning rate by annealing divider.

Topic		Replies	Views
Learning rate scheduler for detectnet_v2 TAO Toolkit	13	1582	October 12, 2021
How to set soft_start_annealing_schedule prams to training process reach to num_epochs? TAO Toolkit	3	952	September 13, 2021
Training set accuracy is lower than validation set accuracy for classification task TAO Toolkit	7	1556	April 14, 2022
How to data Augmentation in classification training TAO Toolkit	2	328	April 2, 2023
TAO action recogniton net trainning extremely slow TAO Toolkit tao	20	644	August 7, 2023
mAP training several classes = 0.0 and so low with data custom using detectnet_v2 (resnet_18)) TAO Toolkit	33	494	February 1, 2024
Performance of TAO 3.22.05 and TAO 4.0.1 is lower than TAO 3.21.08 TAO Toolkit	9	489	June 15, 2023
Error: Transfer learning toolkit for classification failed to setting image size TAO Toolkit	9	886	October 12, 2021
TLT yolo_v4 slow training TAO Toolkit	11	839	October 12, 2021
Loss, acc, val_acc get stablized soon in both train and re-train TAO Toolkit	6	439	July 3, 2023

Training does not converge well in classification training

Related topics