Can I use RTX8000?

I want to use NVIDIA Modulus 22.09 with docker, on Ubuntu22.04, but I get an error.

root@50a7161248f2:/examples/examples/three_fin_2d# python heat_sink.py
/opt/conda/lib/python3.8/site-packages/hydra/_internal/callbacks.py:26: UserWarning: Callback ModulusCallback.on_job_start raised RuntimeError: Running CUDA fuser is only supported on CUDA builds.
warnings.warn(
[02:57:43] - Arch Node: heat_network has been converted to a FuncArch node.
[02:57:49] - Arch Node: flow_network has been converted to a FuncArch node.
[02:57:50] - Arch Node: heat_network has been converted to a FuncArch node.
[02:57:51] - Arch Node: flow_network has been converted to a FuncArch node.
[02:57:51] - attempting to restore from: outputs/heat_sink [02:57:51] - optimizer checkpoint not found [02:57:51] - model flow_network.0.pth not found [02:57:51] - model heat_network.0.pth not found Error executing job with overrides: Traceback (most recent call last):
File “heat_sink.py”, line 275, in run
slv.solve()
File “/modulus/modulus/solver/solver.py”, line 159, in solve
self._train_loop(sigterm_handler)
File “/modulus/modulus/trainer.py”, line 521, in _train_loop
loss, losses = self._cuda_graph_training_step(step)
File “/modulus/modulus/trainer.py”, line 694, in _cuda_graph_training_step
self.warmup_stream = torch.cuda.Stream()
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/streams.py”, line 34, in new
return super(Stream, cls).new(cls, priority=priority, **kwargs)
RuntimeError: CUDA error: no CUDA-capable device is detected CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Driver is >515, but I can’t.

$ nvidia-smi
Sat Dec 17 12:00:12 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 Off | 00000000:D8:00.0 Off | Off |
| 33% 31C P8 11W / 260W | 5MiB / 49152MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1857 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:D8:00.0

Can Modulus use RTX8000?

Hi @con2

Seems this is some error that’s occurring with cuda graphs. We don’t currently test Modulus on RTX8000 so unfortunately I cannot have a complete solution (however we have tested it fine on other Quadros).
I would try shutting off cuda graphs in your config.yaml file:

cuda_graphs: False

Does the base line Helmholtz example work for you?

Thanks for the advice.
Results are as follows.

helmholts.py: Work

root@9d811d51116e:/examples/examples/helmholtz# python helmholtz.py
[23:50:39] - JIT using the NVFuser TorchScript backend
[23:50:39] - JitManager: {'_enabled': True, '_arch_mode': <JitArchMode.ONLY_ACTIVATION: 1>, '_use_nvfuser': True, '_autograd_nodes': False}
[23:50:39] - GraphManager: {'_func_arch': False, '_debug': False, '_func_arch_allow_partial_hessian': True}
[23:50:43] - attempting to restore from: outputs/helmholtz
[23:50:43] - optimizer checkpoint not found
[23:50:43] - model wave_network.0.pth not found
~~~
[00:06:43] - [step:      19900] loss:  1.077e-02, time/iteration:  4.483e+01 ms
[00:06:48] - [step:      20000] record constraint batch time:  3.775e-02s
[00:06:49] - [step:      20000] record validators time:  5.587e-01s
[00:06:49] - [step:      20000] saved checkpoint to outputs/helmholtz
[00:06:49] - [step:      20000] loss:  1.047e-02, time/iteration:  5.821e+01 ms
[00:06:49] - [step:      20000] reached maximum training steps, finished training!

heat_sink.py: Not work
heat_sink.py -cuda_graphs:False: Work, but too Slow

root@9d811d51116e:/examples/examples/three_fin_2d# python heat_sink.py
/opt/conda/lib/python3.8/site-packages/hydra/_internal/callbacks.py:26: UserWarning: Callback ModulusCallback.on_job_start raised RuntimeError: Running CUDA fuser is only supported on CUDA builds.
  warnings.warn(
[00:45:21] - Arch Node: heat_network has been converted to a FuncArch node.
[00:45:27] - Arch Node: flow_network has been converted to a FuncArch node.
[00:45:28] - Arch Node: heat_network has been converted to a FuncArch node.
[00:45:29] - Arch Node: flow_network has been converted to a FuncArch node.
[00:45:29] - attempting to restore from: outputs/heat_sink
[00:45:29] - optimizer checkpoint not found
[00:45:29] - model flow_network.0.pth not found
[00:45:29] - model heat_network.0.pth not found
[00:46:16] - [step:          0] record constraint batch time:  1.185e+01s
[00:46:17] - [step:          0] record validators time:  1.201e+00s
[00:46:17] - [step:          0] record monitor time:  1.111e-01s
[00:46:17] - [step:          0] saved checkpoint to outputs/heat_sink
[00:46:17] - [step:          0] loss:  1.812e+00
[01:43:23] - [step:        100] loss:  4.028e-01, time/iteration:  3.425e+04 ms
[02:37:05] - [step:        200] loss:  3.519e-01, time/iteration:  3.222e+04 ms

2nd
heat_sink.py -cuda_graphs:False:
Compared to the first, the second calculation is faster.

root@27c5850e9313:/examples/examples/three_fin_2d# python heat_sink.py
[02:51:41] - JitManager: {'_enabled': False, '_arch_mode': <JitArchMode.ONLY_ACTIVATION: 1>, '_use_nvfuser': True, '_autograd_nodes': False}
[02:51:41] - GraphManager: {'_func_arch': False, '_debug': False, '_func_arch_allow_partial_hessian': True}
[02:51:51] - attempting to restore from: outputs/heat_sink
[02:51:51] - Success loading optimizer: outputs/heat_sink/optim_checkpoint.0.pth
[02:51:51] - Success loading model: outputs/heat_sink/flow_network.0.pth
[02:51:51] - Success loading model: outputs/heat_sink/heat_network.0.pth
[02:51:53] - [step:          0] record constraint batch time:  1.854e-01s
[02:51:53] - [step:          0] record validators time:  1.181e-01s
[02:51:53] - [step:          0] record monitor time:  3.515e-02s
[02:51:53] - [step:          0] saved checkpoint to outputs/heat_sink
[02:51:53] - [step:          0] loss:  1.266e+01
[02:52:20] - [step:        100] loss:  4.566e-01, time/iteration:  2.673e+02 ms
[02:52:47] - [step:        200] loss:  3.499e-01, time/iteration:  2.709e+02 ms
[02:53:14] - [step:        300] loss:  2.853e-01, time/iteration:  2.698e+02 ms
[02:53:41] - [step:        400] loss:  3.362e-01, time/iteration:  2.708e+02 ms
[02:54:08] - [step:        500] loss:  1.882e-01, time/iteration:  2.690e+02 ms
[02:54:35] - [step:        600] loss:  3.896e-01, time/iteration:  2.674e+02 ms
[02:55:02] - [step:        700] loss:  2.518e-01, time/iteration:  2.710e+02 ms
[02:55:29] - [step:        800] loss:  2.079e-01, time/iteration:  2.709e+02 ms

But it doesn’t converge.

[12:13:50] - [step:     125200] loss:  6.925e-02, time/iteration:  2.709e+02 ms
[12:14:17] - [step:     125300] loss:  6.975e-02, time/iteration:  2.697e+02 ms
[12:14:44] - [step:     125400] loss:  1.216e-01, time/iteration:  2.689e+02 ms
[12:15:11] - [step:     125500] loss:  7.264e-02, time/iteration:  2.687e+02 ms
[12:15:38] - [step:     125600] loss:  6.104e-02, time/iteration:  2.686e+02 ms
[12:16:04] - [step:     125700] loss:  7.468e-02, time/iteration:  2.675e+02 ms