Error with accessing validation data - csv_to_dict()

mitanshtrip · February 13, 2023, 5:50am

I am new to modulus and am running the ldc case. I am able to create an outputs folder, however, when compiling I run into the issue related to validation data as posted below

[00:48:58] - JIT using the NVFuser TorchScript backend
[00:48:58] - JitManager: {‘_enabled’: True, ‘_arch_mode’: <JitArchMode.ONLY_ACTIVATION: 1>, ‘_use_nvfuser’: True, ‘_autograd_nodes’: False}
[00:48:58] - GraphManager: {‘_func_arch’: False, ‘_debug’: False, ‘_func_arch_allow_partial_hessian’: True}
Error executing job with overrides:
ValueError: could not convert string to float: ‘oid sha256:4c68adf2b0a04c53f0abd4d3920f3fec618669399638dd5ece84785f474d1fa6’

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “ldc_2d.py”, line 82, in run
openfoam_var = csv_to_dict(

ngeneva · February 13, 2023, 4:48pm

Hi @mitanshtrip

This error is typically because you do not have Git LFS installed:

Please see our install notes regarding this.

mitanshtrip · February 13, 2023, 5:27pm

Hello, So I had previously installed git-lfs (git-lfs/3.2.0 (GitHub; linux amd64; go 1.18.2)). I recloned the examples directory, this time I am getting a different error as stated below. Is this mainly because of the installed pytorch version?

/examples/ldc# python3 ldc_2d.py
[12:24:27] - JIT using the NVFuser TorchScript backend
[12:24:27] - Disabling JIT because functorch does not work with it.
[12:24:27] - JitManager: {‘_enabled’: False, ‘_arch_mode’: <JitArchMode.ONLY_ACTIVATION: 1>, ‘_use_nvfuser’: True, ‘_autograd_nodes’: False}
[12:24:27] - GraphManager: {‘_func_arch’: True, ‘_debug’: False, ‘_func_arch_allow_partial_hessian’: True}
[12:24:31] - Arch Node: flow_network has been converted to a FuncArch node.
[12:24:31] - Installed PyTorch version 1.13.0+cu116 is not TorchScript supported in Modulus. Version 1.13.0a0+d321be6 is officially supported.
[12:24:31] - attempting to restore from: outputs/ldc_2d
[12:24:31] - optimizer checkpoint not found
[12:24:31] - model flow_network.0.pth not found
Error executing job with overrides:

ngeneva · February 13, 2023, 5:34pm

Hi @mitanshtrip

Is there an error message below Error executing job with overrides: ? Its a bit difficult to tell what happening from the console log here.

mitanshtrip · February 13, 2023, 5:38pm

It is actually a very long long message, I have copy pasted some from there below:

Error executing job with overrides:
Traceback (most recent call last):
File “ldc_2d.py”, line 116, in run
slv.solve()
File “/usr/local/lib/python3.8/dist-packages/modulus-22.9-py3.8.egg/modulus/solver/solver.py”, line 159, in solve
self._train_loop(sigterm_handler)
File “/usr/local/lib/python3.8/dist-packages/modulus-22.9-py3.8.egg/modulus/trainer.py”, line 593, in _train_loop
self._record_constraints()
File “/usr/local/lib/python3.8/dist-packages/modulus-22.9-py3.8.egg/modulus/trainer.py”, line 275, in _record_constraints
self.record_constraints()
File “/usr/local/lib/python3.8/dist-packages/modulus-22.9-py3.8.egg/modulus/solver/solver.py”, line 116, in record_constraints
self.domain.rec_constraints(self.network_dir)
File “/usr/local/lib/python3.8/dist-packages/modulus-22.9-py3.8.egg/modulus/domain/domain.py”, line 45, in rec_constraints
constraint.save_batch(constraint_data_dir + key)
File “/usr/local/lib/python3.8/dist-packages/modulus-22.9-py3.8.egg/modulus/domain/constraint/continuous.py”, line 60, in save_batch
pred_outvar = modl(invar)
File “/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py”, line 1190, in _call_impl
return forward_call(*input, **kwargs)

File “/usr/local/lib/python3.8/dist-packages/functorch/_src/eager_transforms.py”, line 113, in _autograd_grad
grad_inputs = torch.autograd.grad(diff_outputs, inputs, grad_outputs,
File “/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py”, line 300, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

ngeneva · February 13, 2023, 5:55pm

Hi @mitanshtrip

Thanks. This seems to be some issue with the gradient calculations. Please try turning off functorch for gradient calculations by putting the following in your config.yaml file for this problem:

graph:
  func_arch: false

If that does not work then I would try shutting off CUDA graphs with cuda_graphs: false

mitanshtrip · February 13, 2023, 7:09pm

Hello, I tried those options but, I am getting the same error. Would this line be the cause?

Installed PyTorch version 1.13.0+cu116 is not TorchScript supported in Modulus. Version 1.13.0a0+d321be6 is officially supported.

ngeneva · February 17, 2023, 11:11pm

Hi @mitanshtrip

Is this an error or just a warning? Typically this message is just from a warning that JIT is not being used which has little to no performance gain most of the time. You can also turn this off by shutting off JIT in your config using jit: false.

Topic		Replies	Views
Modulus example case (helmholtz): Value Error Technical Support (Modulus Only)	4	1234	December 7, 2022
Error while running Modulus chip 2d.py Technical Support (Modulus Only)	3	958	March 12, 2023
Error occurred during Hydra's exception formatting Technical Support (Modulus Only) modulus	2	1872	February 25, 2023
Error when running cylinder_2d.py example - GTX 1660 Technical Support (Modulus Only)	2	1172	December 22, 2022
Modulus-Sym examples _ ldc error Technical Support (Modulus Only)	5	776	May 26, 2023
Error when running Modulus Technical Support (Modulus Only) modulus	4	3557	September 19, 2022
Error in installing modulus bare metal version in wsl2 Technical Support (Modulus Only)	3	1083	August 22, 2022
Can I use RTX8000? Technical Support (Modulus Only) cuda	2	904	December 21, 2022
Working with custom constraints: criteria and outvar Technical Support (Modulus Only)	3	903	June 25, 2023
Error when applying exact continuity feature for solving the heat variable Technical Support (Modulus Only)	3	803	February 10, 2023

Error with accessing validation data - csv_to_dict()

Related topics