I want to use NVIDIA Modulus 22.09 with docker, on Ubuntu22.04, but I get an error.
root@50a7161248f2:/examples/examples/three_fin_2d# python heat_sink.py
/opt/conda/lib/python3.8/site-packages/hydra/_internal/callbacks.py:26: UserWarning: Callback ModulusCallback.on_job_start raised RuntimeError: Running CUDA fuser is only supported on CUDA builds.
warnings.warn(
[02:57:43] - Arch Node: heat_network has been converted to a FuncArch node.
[02:57:49] - Arch Node: flow_network has been converted to a FuncArch node.
[02:57:50] - Arch Node: heat_network has been converted to a FuncArch node.
[02:57:51] - Arch Node: flow_network has been converted to a FuncArch node.
[02:57:51] - attempting to restore from: outputs/heat_sink [02:57:51] - optimizer checkpoint not found [02:57:51] - model flow_network.0.pth not found [02:57:51] - model heat_network.0.pth not found Error executing job with overrides: Traceback (most recent call last):
File “heat_sink.py”, line 275, in run
slv.solve()
File “/modulus/modulus/solver/solver.py”, line 159, in solve
self._train_loop(sigterm_handler)
File “/modulus/modulus/trainer.py”, line 521, in _train_loop
loss, losses = self._cuda_graph_training_step(step)
File “/modulus/modulus/trainer.py”, line 694, in _cuda_graph_training_step
self.warmup_stream = torch.cuda.Stream()
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/streams.py”, line 34, in new
return super(Stream, cls).new(cls, priority=priority, **kwargs)
RuntimeError: CUDA error: no CUDA-capable device is detected CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Driver is >515, but I can’t.
$ nvidia-smi
Sat Dec 17 12:00:12 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 Off | 00000000:D8:00.0 Off | Off |
| 33% 31C P8 11W / 260W | 5MiB / 49152MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1857 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:D8:00.0
Can Modulus use RTX8000?