The SSD training saves the model weight after each epoch, and with each file ~100MB then the space required becomes large quite quickly. Is there any way to change that?
I would prefer to only keep the best model weights based on the mAP validation metric.
The ssd train does log for each epoch the validation mAP, loss and other metrics to a ssd_training_log_resnet18.csv file. I do use that to prune out the model files that I do not want to store. But that feels kind of backwards to me. The SSD training is implemented in the TensorFlow framework and it should be rather simple to allow for different ModelCheckpoint strategies. I would like this to be a configurable option for all TAO training jobs in next TAO versions.
Is there any platform where I could send a feature request to the TAO backlog?
Yes, you can delete the extra tlt model according to the mAP result in ssd_training_log_resnet18.csv file.
For this feature request, I will sync with internal team.
Currently, the forum is the platform. I can also ask internal team about the bridge for end user to submit feature request.