Training checkpoints more frequently


Currently my model is only saving checkpoints based on if the validation metric is a new “best”. Is there a way to ensure my model saves checkpoints more frequently than just at bests ie. every X epochs?


Thanks for your interest in Clara Train SDK. Please note we have recently release clara train V4.0 based on MONAI which uses PyTorch. Please check out the notebooks to get you started clara-train-examples/PyTorch/NoteBooks at master · NVIDIA/clara-train-examples · GitHub

To your question there is a way in V4 using handlers as in this example clara-train-examples/config_train_Unet.json at master · NVIDIA/clara-train-examples · GitHub

        "name": "CheckpointSaver",
        "rank": 0,
        "args": {
          "save_dir": "{MMAR_CKPT_DIR}",
          "save_dict": ["model", "optimizer", "lr_scheduler"],
          "save_final": true,
          "save_interval": 5

Hope this helps