Nvidia forums don’t allow zip files, only jpeg, pdf, logzz types. What error are you experiencing when you try to click the onedrive link? Here is a google drive link with the same file:
I’ll test out that soft_start parameter this afternoon, although in other training frameworks that use a similar learning-rate scheduler, we still don’t see this drastic of a change in model performance in the beginning of re-training.
I also disagree with your statement that my graph “proves that pretrained model helps”. You state that at epoch 50 the pre-trained model is higher than no-pretrained model, which is true, but at epochs 80 and 85, the pre-trained model is LOWER than the no-pretrained model. I think these fluctuations are merely a result of random weight initialization.
I cannot access into the drive file too. Never mind, it is not a must to check now.
For pre-trained model not loading, I am syncing with internal team about the behavior. Will update if any finding.
Hi harryhsl8c,
Update my previous comment, Please set load_graph to true during retraining.
In case of unpruned or pruned, if the entire model needs to be loaded during retraining, it is needed to set load_graph to true.
Prior to posting on these forums, I already tried multiple times with load_graph = true and with load_graph = false to see if there was a difference–there was not.
@Morganh I performed another 200 epochs of training using my pretrained tlt model as the pretrained_model_file , this time with load_graph = true. The results are the same and have been added to the graph. I attempted to use load_graph = true for the .h5 NGC model, which results in an error. There is no point in setting load_graph = true for the scenario where I use no model for pretrained_model_file.
Yesterday I was able to have a 1:1 discussion with Subhashree Radhakrishnan during the NGC “Meet with the Experts” session on the Transfer Learning Toolkit. I shared this issue and sent her a link to this page. She was unsure why TLT was behaving like this and said she would look into it.
Can we un-mark this forum post as Solved ? Unless this bug is specific to me, which we haven’t proved yet, I don’t believe the original question has sufficiently been answered.
Hi harryhsl8c,
Sorry for late reply. After checking, unfortunately we find there is an issue for detectnet_v2 only. It cannot load the pretrained model correctly during retraining. And this issue will be fixed in next tlt release.
Appreciate for your hard working and contribution.