Hi, so I have been able to get the helmholtz and chip_2d cases to work. However I am getting an error when running the ldc and ldc_zeroEq models. I saw some answers on this forum but it did not help my case, the error is as such:
modulus-sym/examples/ldc# python3 ldc_2d.py
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See Changes to job's runtime working directory | Hydra for more information.
ret = run_job(
[01:56:29] - JitManager: {‘_enabled’: False, ‘_arch_mode’: <JitArchMode.ONLY_ACTIVATION: 1>, ‘_use_nvfuser’: True, ‘_autograd_nodes’: False}
[01:56:29] - GraphManager: {‘_func_arch’: False, ‘_debug’: False, ‘_func_arch_allow_partial_hessian’: True}
[01:56:34] - attempting to restore from: outputs/ldc_2d
[01:56:34] - optimizer checkpoint not found
[01:56:34] - model flow_network.0.pth not found
Error executing job with overrides:
Traceback (most recent call last):
File “ldc_2d.py”, line 136, in run
slv.solve()
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/solver/solver.py”, line 173, in solve
self._train_loop(sigterm_handler)
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/trainer.py”, line 535, in _train_loop
loss, losses = self._cuda_graph_training_step(step)
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/trainer.py”, line 716, in _cuda_graph_training_step
self.loss_static, self.losses_static = self.compute_gradients(
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/trainer.py”, line 68, in adam_compute_gradients
losses_minibatch = self.compute_losses(step)
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/solver/solver.py”, line 66, in compute_losses
return self.domain.compute_losses(step)
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/domain/domain.py”, line 147, in compute_losses
constraint.forward()
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/domain/constraint/continuous.py”, line 130, in forward
self._output_vars = self.model(self._input_vars)
File “/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py”, line 1190, in _call_impl
return forward_call(*input, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/graph.py”, line 234, in forward
outvar.update(e(outvar))
File “/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py”, line 1190, in _call_impl
return forward_call(*input, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/eq/derivatives.py”, line 99, in forward
grad = gradient(var, grad_var)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/modulus/sym/eq/derivatives.py”, line 38, in gradient
“”"
grad_outputs: List[Optional[torch.Tensor]] = [torch.ones_like(y, device=y.device)]
grad = torch.autograd.grad(
~~~~~~~~~~~~~~~~~~~ <— HERE
[
y,
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.