Problem with using Criterion Based Stopping

Hi,

I have been using Modulus specifying the no. of steps it takes. But I recently found that there’s actually Criterion Based Stopping inside Modulus. I tried to add it to the yaml file:

defaults:
...
  
stop_criterion:
  - metric: 'l2_relative_error_u'
  - min_delta: 0.1
  - patience: 5000
  - mode: 'min'
  - freq: 2000
  - strict: true

I got the error:

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
root@8ce88cf24d88:/modulus_projects/flat_plate_2209_p0# python fp_p0_refine_nondim.py
In 'config_p0_0.05': Validation error while composing config:
Merge error: list is not a subclass of DefaultStopCriterion. value: [{'metric': 'l2_relative_error_u'}, {'min_delta': 0.1}, {'patience': 5000}, {'mode': 'min'}, {'freq': 2000}, {'strict': True}]
    full_key:
    object_type=DefaultModulusConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

So I removed the “-” and I got Modulus running. However, after a while, it is stuck again:

[14:10:40] - [step:          0] record monitor time:  3.397e-01s
[14:10:42] - [step:          0] saved checkpoint to outputs/fp_p0_refine_nondim/network_checkpoint_flow
[14:10:42] - [step:          0] loss:  1.055e+00
Error executing job with overrides: []
Traceback (most recent call last):
  File "fp_p0_refine_nondim.py", line 472, in run
    slv.solve()
  File "/opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/solver/solver.py", line 159, in solve
    self._train_loop(sigterm_handler)
  File "/opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/trainer.py", line 670, in _train_loop
    stop_training = self._check_stopping_criterion(loss, losses, step)
  File "/opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/trainer.py", line 370, in _check_stopping_criterion
    stop_training = self.stop_criterion.evaluate(criterion_metric_dict)
  File "/opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/utils/training/stop_criterion.py", line 74, in evaluate
    self.check_frequencies(metric_dict)
AttributeError: 'StopCriterion' object has no attribute 'check_frequencies'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

So what’s the problem?

I. Is my yaml correct?
2. What else should I change to make it work?

Thanks!

Hi @tsltaywb

This seems to be a bug, can you edit stop_criterion.py in your Modulus source code to change line 73-75 to:

if self.check_freqs:
     self._check_frequencies(metric_dict)
 score = self._get_score(metric_dict, self.target_key)

(note the underscore in front of the methods).

Please let me know if this runs. This Python file should be at /opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/utils/training/stop_criterion.py on your machine based on the error message. Thanks for the report!

Ok, just tried it. Looks like it’s working using:

stop_criterion:
metric: ‘l2_relative_error_u’
min_delta: 0.1
patience: 5000
mode: ‘min’
freq: 2000
strict: true

Thanks for fixing the bug!

1 Like