Learning rate scheduler for detectnet_v2

Is there a way to get/calculate learning rate at a particular epoch for training detectenet_v2 model. Also, how exactly is the learning rate scheduled ?

For monitoring learning rate, this feature is not available yet.

More details about learning rate

Only soft_start_annealing_schedule with these nested parameters is supported.

min_learning_rate: minimum learning late to be seen during the entire experiment
max_learning_rate: maximum learning rate to be seen during the entire experiment
soft start: Time to be lapsed before warm up ( expressed in percentage of progress between 0 and 1)
annealing: Time to start annealing the learning rate

Hi @Morganh,
How does this schedule work. What exactly does soft start annealing rate mean?
Can we calculate the learning rate at the end of each epoch using these params?

Also, what exactly does these parameters mean? soft start -warmup etc

Learning rate scheduler adjusts learning rate in the following 3 phases:
Phase 1: 0.0 <= progress < soft_start:
Starting from min_lr exponentially increase the learning rate to base_lr
Phase 2: soft_start <= progress < annealing_start:
Maintain the learning rate at base_lr
Phase 3: annealing_start <= progress <= 1.0:
Starting from base_lr exponentially decay the learning rate to min_lr

base_lr: Maximum learning rate
soft_start: The progress at which learning rate achieves base_lr when starting from min_lr
annealing_start: The progress at which learning rate starts to drop from base_lr to min_lr

1 Like

So, I need to train a model for 100 epochs keeping nvidia detectnetv2 pretrained model as base. So instead of training at a stretch, which is no feasible for my current setup as retraining from checkpoints is not supported. I wish to divide the training in batches of 10 epochs for a total of 100 epochs. With the trained model from previous batch as a base for the next retraining. I wanted to adjust the learning rate also accordingly. Is there a way to do so? Or is there a suggested way to perform this retraining.

Next TLT release will support retraining from checkpoints.
For your case, if you get a tlt model which already run at 10th epochs, then you can set it as pre-trained model, trigger retraining…
But unfortunately for current 1.0.1 version, there is a problem for loading pre-trained model during retraining. Retraining with pretrained tlt models - #27 by Morganh
It will be fixed in next release too.

Yeah, I read that too. When is the next TLT release scheduled? And is there any way to get/calculate learning rate based on the schedule?

Not sure. But is will be released soon.
As I mentioned, the learning rate is your max_lr during phase2.

Phase 2: soft_start <= progress < annealing_start:
Maintain the learning rate at base_lr

Hi @Morganh , I understood the phase2 part but what about the other phases. Can you give the exact exponential decay and increase function by which learning rate is increased/decreased in phase1 and phase 3?

Sorry, the formula is not available to release yet.

HI @Morganh,
That means there is no reliable way to use preexisting models ?

No, I mean for phase1 or phase3, the exact formula is not visible yet.
For soft-annealing, acutally there are only 4 parameters needed. You can set

  1. soft_start
  2. annealing_start point
  3. max_lr
  4. min_lr