Hi,
I have been using Modulus specifying the no. of steps it takes. But I recently found that there’s actually Criterion Based Stopping inside Modulus. I tried to add it to the yaml file:
defaults:
...
stop_criterion:
- metric: 'l2_relative_error_u'
- min_delta: 0.1
- patience: 5000
- mode: 'min'
- freq: 2000
- strict: true
I got the error:
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
root@8ce88cf24d88:/modulus_projects/flat_plate_2209_p0# python fp_p0_refine_nondim.py
In 'config_p0_0.05': Validation error while composing config:
Merge error: list is not a subclass of DefaultStopCriterion. value: [{'metric': 'l2_relative_error_u'}, {'min_delta': 0.1}, {'patience': 5000}, {'mode': 'min'}, {'freq': 2000}, {'strict': True}]
full_key:
object_type=DefaultModulusConfig
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
So I removed the “-” and I got Modulus running. However, after a while, it is stuck again:
[14:10:40] - [step: 0] record monitor time: 3.397e-01s
[14:10:42] - [step: 0] saved checkpoint to outputs/fp_p0_refine_nondim/network_checkpoint_flow
[14:10:42] - [step: 0] loss: 1.055e+00
Error executing job with overrides: []
Traceback (most recent call last):
File "fp_p0_refine_nondim.py", line 472, in run
slv.solve()
File "/opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/solver/solver.py", line 159, in solve
self._train_loop(sigterm_handler)
File "/opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/trainer.py", line 670, in _train_loop
stop_training = self._check_stopping_criterion(loss, losses, step)
File "/opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/trainer.py", line 370, in _check_stopping_criterion
stop_training = self.stop_criterion.evaluate(criterion_metric_dict)
File "/opt/conda/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/utils/training/stop_criterion.py", line 74, in evaluate
self.check_frequencies(metric_dict)
AttributeError: 'StopCriterion' object has no attribute 'check_frequencies'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
So what’s the problem?
I. Is my yaml correct?
2. What else should I change to make it work?
Thanks!