Execution of my modulus code is resulting in the following error.
[code]
[06:15:56] - attempting to restore from: outputs/Battery
[06:15:56] - Success loading optimizer: outputs/Battery/optim_checkpoint.0.pth
[06:15:56] - Success loading model: outputs/Battery/battery_network.0.pth
[06:15:57] - [step: 0] record constraint batch time: 4.146e-01s
[06:15:57] - [step: 0] saved checkpoint to outputs/Battery
[06:15:57] - [step: 0] loss: 2.148e+01
[06:16:09] - Attempting cuda graph building, this may take a bit...
Error executing job with overrides: []
Traceback (most recent call last):
File "/modulus/modulus/trainer.py", line 728, in _cuda_graph_training_step
self.loss_static, self.losses_static = self.compute_gradients(
File "/modulus/modulus/trainer.py", line 54, in adam_compute_gradients
losses_minibatch = self.compute_losses(step)
File "/modulus/modulus/solver/solver.py", line 52, in compute_losses
return self.domain.compute_losses(step)
File "/modulus/modulus/domain/domain.py", line 133, in compute_losses
constraint.forward()
File "/modulus/modulus/domain/constraint/continuous.py", line 116, in forward
self._output_vars = self.model(self._input_vars)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl
return forward_call(*input, **kwargs)
File "/modulus/modulus/graph.py", line 220, in forward
outvar.update(e(outvar))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl
return forward_call(*input, **kwargs)
File "/modulus/modulus/utils/sympy/torch_printer.py", line 274, in forward
output = self.torch_expr(args)
File "<lambdifygenerated-7>", line 3, in _lambdifygenerated
return (-3.85e-11*sqrt(c)*sqrt(c_s)*sqrt(28606 - c_s)*(-2.71828**(-Phi_1 + Phi_2) + 2.71828**(Phi_1 - Phi_2)) + j_n)
File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 32, in wrapped
return f(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 671, in __rpow__
return torch.tensor(other, dtype=dtype, device=self.device) ** self
RuntimeError: CUDA error: operation not permitted when stream is capturing
[/code]
The culprit seems to be the constraint corresponding to the equation
(-3.85e-11*sqrt(c)*sqrt(c_s)*sqrt(28606 - c_s)*(-2.71828**(-Phi_1 + Phi_2) + 2.71828**(Phi_1 - Phi_2)) + j_n)
as can be seen from the error. What can be the potential causes for this issue? Is it possible that exponential terms are too large for the gradients to be computed?