Hello all,
I’m hoping someone can help me with an error message I’m encountering when trying to run Isaac Sim on a multi-GPU server. Specifically, I’m receiving the following error message:
[Info] [carb] Logging to file: /home/xxx/.nvidia-omniverse/logs/Kit/Isaac-Sim/2022.2/kit_20230311_102322.log
2023-03-11 10:23:22 [29ms] [Warning] [omni.ext.plugin] [ext: omni.drivesim.sensors.nv.lidar] Extensions config 'extension.toml' doesn't exist '/home/xxx/.local/share/ov/pkg/isaac_sim-2022.2.0/exts/omni.drivesim.sensors.nv.lidar' or '/home/xxx/.local/share/ov/pkg/isaac_sim-2022.2.0/exts/omni.drivesim.sensors.nv.lidar/config'
2023-03-11 10:23:22 [29ms] [Warning] [omni.ext.plugin] [ext: omni.drivesim.sensors.nv.radar] Extensions config 'extension.toml' doesn't exist '/home/xxx/.local/share/ov/pkg/isaac_sim-2022.2.0/exts/omni.drivesim.sensors.nv.radar' or '/home/xxx/.local/share/ov/pkg/isaac_sim-2022.2.0/exts/omni.drivesim.sensors.nv.radar/config'
[0.309s] [ext: omni.stats-0.0.0] startup
[0.360s] [ext: omni.rtx.shadercache-1.0.0] startup
[0.378s] [ext: omni.assets.plugins-0.0.0] startup
[0.380s] [ext: omni.gpu_foundation-0.0.0] startup
2023-03-11 10:23:22 [354ms] [Warning] [carb] FrameworkImpl::setDefaultPlugin(client: omni.gpu_foundation_factory.plugin, desc : [carb::graphics::Graphics v2.11], plugin : carb.graphics-vulkan.plugin) failed. Plugin selection is locked, because the interface was previously acquired by:
[0.389s] [ext: carb.windowing.plugins-1.0.0] startup
[0.400s] [ext: omni.kit.renderer.init-0.0.0] startup
|---------------------------------------------------------------------------------------------|
| Driver Version: 0 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
|=============================================================================================|
| OS: Linux 5abaebe11826, Version: 5.4.0-132-generic
| XServer Vendor: The X.Org Foundation, XServer Version: 12008000 (1.20.8.0)
| Processor: AMD EPYC 7542 32-Core Processor | Cores: Unknown | Logical: 128
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 515833 | Free Memory: 265317
| Total Page/Swap (MB): 65535 | Free Page/Swap: 65535
|---------------------------------------------------------------------------------------------|
2023-03-11 10:23:22 [438ms] [Error] [gpu.foundation.plugin] No device could be created. Some known system issues:
- The driver is not installed properly and requires a clean re-install.
- Your GPUs do not support RayTracing: DXR or Vulkan ray_tracing, or hardware is excluded due to performance.
- The driver cannot enumerate any GPU: driver, display or a docker issue. For Vulkan, test it with Vulkaninfo tool from Vulkan SDK, instead of nvidia-smi.
- For Ubuntu, it requires server-xorg-core 1.20.7+ and a display to work without --no-window.
- For Linux dockers, the setup is not complete. Install the latest driver, xServer and NVIDIA container runtime.
|---------------------------------------------------------------------------------------------|
| Driver Version: 0 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
|=============================================================================================|
| OS: Linux 5abaebe11826, Version: 5.4.0-132-generic
| XServer Vendor: The X.Org Foundation, XServer Version: 12008000 (1.20.8.0)
| Processor: AMD EPYC 7542 32-Core Processor | Cores: Unknown | Logical: 128
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 515833 | Free Memory: 265283
| Total Page/Swap (MB): 65535 | Free Page/Swap: 65535
|---------------------------------------------------------------------------------------------|
2023-03-11 10:23:22 [458ms] [Error] [gpu.foundation.plugin] No device could be created. Some known system issues:
- The driver is not installed properly and requires a clean re-install.
- Your GPUs do not support RayTracing: DXR or Vulkan ray_tracing, or hardware is excluded due to performance.
- The driver cannot enumerate any GPU: driver, display or a docker issue. For Vulkan, test it with Vulkaninfo tool from Vulkan SDK, instead of nvidia-smi.
- For Ubuntu, it requires server-xorg-core 1.20.7+ and a display to work without --no-window.
- For Linux dockers, the setup is not complete. Install the latest driver, xServer and NVIDIA container runtime.
2023-03-11 10:23:22 [458ms] [Error] [omni.gpu_foundation_factory.plugin] Failed to create GPU foundation devices for compatibilityMode!
However, I believe that I have correctly installed the necessary drivers, and the output from nvidia-smi
seems to indicate that both of my NVIDIA A40 GPUs are working properly. Here’s the output from nvidia-smi
:
(base) xxx@5abae:~$nvidia-smi
Sat Mar 11 10:47:30 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A40 On | 00000000:01:00.0 Off | 0 |
| 0% 44C P0 121W / 300W | 2515MiB / 46068MiB | 30% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A40 On | 00000000:25:00.0 Off | 0 |
| 0% 54C P0 85W / 300W | 3198MiB / 46068MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A40 On | 00000000:41:00.0 Off | 0 |
| 0% 45C P0 108W / 300W | 2375MiB / 46068MiB | 19% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A40 On | 00000000:61:00.0 Off | 0 |
| 0% 44C P0 84W / 300W | 3846MiB / 46068MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA A40 On | 00000000:81:00.0 Off | 0 |
| 0% 58C P0 193W / 300W | 4080MiB / 46068MiB | 48% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA A40 On | 00000000:A1:00.0 Off | 0 |
| 0% 56C P0 156W / 300W | 4080MiB / 46068MiB | 55% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA A40 On | 00000000:C1:00.0 Off | 0 |
| 0% 59C P0 153W / 300W | 4080MiB / 46068MiB | 53% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA A40 On | 00000000:E1:00.0 Off | 0 |
| 0% 60C P0 180W / 300W | 4080MiB / 46068MiB | 54% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 709398 C python 2513MiB |
| 1 N/A N/A 711881 C python 2779MiB |
| 1 N/A N/A 827543 G python 416MiB |
| 2 N/A N/A 728400 C python 2373MiB |
| 3 N/A N/A 711881 G python 118MiB |
| 3 N/A N/A 728400 G python 63MiB |
| 3 N/A N/A 827543 C python 3661MiB |
| 4 N/A N/A 827539 C python 3661MiB |
| 4 N/A N/A 827541 G python 416MiB |
| 5 N/A N/A 827540 C python 3661MiB |
| 5 N/A N/A 827542 G python 416MiB |
| 6 N/A N/A 827540 G python 416MiB |
| 6 N/A N/A 827541 C python 3661MiB |
| 7 N/A N/A 827539 G python 416MiB |
| 7 N/A N/A 827542 C python 3661MiB |
+-----------------------------------------------------------------------------+
Meanwhile, I also face with a problem connecting to omniverse local host, would it give rise to this running failure?
Could someone please help me troubleshoot this issue? I’d be happy to provide any additional information that might be helpful in resolving the problem.
Thank you in advance for your help!