Re_identification_net in TAO 5.3.0 does not generate validation metrics during training

According to the source code, validation metrics should be generated at every checkpoint_interval (link). However with Re Identification model training, the validation metrics are not generated.

It turns out that if we tweak the training batch size and validation batch sizes, for some value combinations specific for a given dataset, the validation metrics are generated at the end of every epoch. However it is completely unclear how to achieve this in a predictable manner.

Can you please let me know how to configure training so that validation metrics are generated at the end of each training epoch or every checkpoint_interval epoch.

Could you please refer to Re_identification_net in TAO 5.3.0 checkpoint_interval configuration is not respected ( no checkpoints/ missed checkpoints) - #4 by Morganh and retry? Thanks.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.