How to continue training for additional epochs?

I have attempted to train a model for additional epochs by using the weights from a previous TLT training run (i.e. in my training spec file I have set training_config.pretrained_model_file value to $OUTPUT/weights/model.tlt). It appears that the model training is not taking advantage of the previous training and is starting from scratch. I base this assumption on the mAP values being reported, as they are quite low and still not as high as the mAP values reported at the end of the initial training.

In the TLT documentation, I don’t see any instructions on how this can be performed or if it is, in fact, possible to pick up from where you left off like this and further train a model for more epochs. What I’ve tried so far doesn’t appear to work as expected, maybe someone can comment on how to do this using TLT?

Thanks in advance for any suggestions or insight.

Apparently this isn’t supported yet: https://devtalk.nvidia.com/default/topic/1064169/transfer-learning-toolkit/resume-training-from-saved-model-step-in-detectnet_v2/

Yes, currently we don’t support resuming training from a checkpoint.