UNet training progress counter frozen after ~18.000 steps

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Hi,
When you run training inside the 5.0 docker, please change /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/unet/scripts/train.py line199~207 as below.
It will fix the issue.

    # Initialize env for AMP training
    if params.use_amp:
        os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'
        # Enable automatic loss scaling
        os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_LOSS_SCALING"] = '1'
    else:
        os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '0'

Thanks.