Using BFGS as optimizer and receive error - KeyError: 'max_iter'

Hello,

I am trying to use the BFGS optimizer instead of ADAM. Each time I try to use this optimizer I get the following error

“envs/modulus/lib/python3.9/site-packages/torch/optim/lbfgs.py”, line 298, in step
max_iter = group[‘max_iter’]
KeyError: ‘max_iter’"

Currently my config file has the following entries

defaults :

  • modulus_default
  • arch:
    • modified_fourier
  • scheduler: tf_exponential_lr
  • loss: sum
  • self
  • optimizer: bfgs

#optimizer:
#max_iter: 20000

scheduler:
decay_rate: 0.95
decay_steps: 5000

save_filetypes : “vtk,npz”

training:
rec_results_freq: 1000
max_steps : 500000 # Previous example required 500,000 epochs to run

batch_size:
upper_lower_BC: 400
Waveguide_port: 400
open_BC: 400
Interior: 4000
RHS: 400

Per this forum post I have also tried putting self before the optimization error

defaults :

  • modulus_default
  • arch:
    • modified_fourier
  • scheduler: tf_exponential_lr
  • loss: sum
  • optimizer: bfgs
  • self

#optimizer:
#max_iter: 20000

scheduler:
decay_rate: 0.95
decay_steps: 5000

save_filetypes : “vtk,npz”

training:
rec_results_freq: 1000
max_steps : 500000 # Previous example required 500,000 epochs to run

batch_size:
upper_lower_BC: 400
Waveguide_port: 400
open_BC: 400
Interior: 4000
RHS: 400

Currently I have max_iters commented out in the optimizer but I have also tried using it.

Thank you for your help.

Hi @tstone

Place the optimizer setting in your defaults list above self. This is a minor detail with Hydra Config.

Some users have encountered a similar problem when this ordering is incorrect:

Hello,

Thank you for your response. In my original post I did link the bug report you referred to and I tried putting optimizer: bfgs above self. After playing around with the config file I found that if I use a fully connected NN then the optimizer bfgs works but with my setup I am using a modified fourier network which doesn’t seem to work with the BFGS optimizer.

Hi @tstone

Hmmm, that’s interesting. Does it give the same error you originally posted with the Modified Fourier network?

Yes, my original network is modified fourier. The error that I listed is the same.

Hi @tstone

What version of modulus sym are you using? I just tested the LDC example with the config:

defaults :
  - modulus_default
  - arch:
      - modified_fourier
  - scheduler: tf_exponential_lr
  - optimizer: bfgs
  - loss: sum
  - _self_
scheduler:
  decay_rate: 0.95
  decay_steps: 4000

training:
  rec_validation_freq: 1000
  rec_inference_freq: 2000
  rec_monitor_freq: 1000
  rec_constraint_freq: 2000
  max_steps : 10000

batch_size:
  TopWall: 1000
  NoSlip: 1000
  Interior: 4000

graph:
  func_arch: true

using a install of Modulus symbolic from the main branch of the Github repo and it was able to get into the training loop fine. Can you try LDC problem with this config and see if you get the same error?

I am using modulus version 22.09

Hi,
I’m following up on the same problem. I can confirm that LDC example is running all right, but when using modified Fourier network, same key error happens.
I’m using miniconda environment and python 3.8 (because there are cmake issues when installing the nvidia-modulus.sym. Please correct me if there’s been any updates). nvidia-modulus and nvidia-modulus.sym is installed from pip, showing version of 0.1.0 for modulus, and 1.0.0 for modulus.sym.
Please let me know if you need additional information.
Thanks!

Hi @yifannie yifannie,

Great thanks for reporting this, can you quickly try the most recent modulus.sym and open a bug issue on the repo for us to track this with error log. Thank you!