Stop the training when training loss reach a tolerance value

Currently, only these stopping criteria are supported;

     - metric: 'l2_relative_error_u'
     - min_delta: 0.1
     - patience: 5000
     - mode: 'min'
     - freq: 2000
     - strict: true

I wanted to stop the training when the training loss goes below a certain limit. In simple words.

while training:
    if training loss <tol:

I am using a bare metal NVIDIA Modulus, so I can edit the source code, if it needs slight modification to achieve this. I can see in the a simple break is implemented to stop the training when stopping criteria is met or when maximum training iterations is reached.

I want to add the if condition here at the start of each iteration.
How do I access the training loss? Is it a part of the dictionary losses?

I also need to save the iteration number where the training loss met this criteria.

Hi @prakhar_sharma

Yes, losses is a dictionary of loss values computed here. So you can add any logic involving your losses after that to exit the training loop (can make a exit flag, set step = self.max_steps + 1 to break the loop, etc.).

(For info about what is that loss dictionary, you can see the trainer iterating over the losses for logging here)

