Hi,
I have problems with docker in WSL for 22.09. Hence, I tried to do a bare metal install. I tried to run the helmholtz example. It ran for a few steps but gave the error:
[11:12:04] - JitManager: {'_enabled': False, '_arch_mode': <JitArchMode.ONLY_ACTIVATION: 1>, '_use_nvfuser': True, '_autograd_nodes': False}
[11:12:04] - GraphManager: {'_func_arch': False, '_debug': False, '_func_arch_allow_partial_hessian': True}
[11:12:09] - attempting to restore from: outputs/helmholtz
[11:12:09] - Success loading optimizer: outputs/helmholtz/optim_checkpoint.0.pth
[11:12:09] - Success loading model: outputs/helmholtz/wave_network.0.pth
[11:12:10] - [step: 0] record constraint batch time: 9.617e-02s
[11:12:11] - [step: 0] record validators time: 1.156e+00s
[11:12:11] - [step: 0] saved checkpoint to outputs/helmholtz
[11:12:11] - [step: 0] loss: 9.899e+03
[11:12:13] - Attempting cuda graph building, this may take a bit...
Error executing job with overrides: []
Traceback (most recent call last):
File "helmholtz.py", line 92, in run
slv.solve()
File "/home/user/modulus_22.09/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/solver/solver.py", line 159, in solve
self._train_loop(sigterm_handler)
File "/home/user/modulus_22.09/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/trainer.py", line 521, in _train_loop
loss, losses = self._cuda_graph_training_step(step)
File "/home/user/modulus_22.09/lib/python3.8/site-packages/modulus-22.9-py3.8.egg/modulus/trainer.py", line 724, in _cuda_graph_training_step
self.g = torch.cuda.CUDAGraph()
File "/home/user/modulus_22.09/lib/python3.8/site-packages/torch/cuda/graphs.py", line 50, in __init__
super(CUDAGraph, self).__init__()
RuntimeError: CUDA graphs may only be used in Pytorch built with CUDA >= 11.0 and not yet supported on ROCM
So how can I solve this error?
Thanks.