Isaac Sim + orbit container on A100

Sorry for my naive questions but i’m really getting confused and need your help pls.
Previously, I used to implement/debug/test code RL algorithms using isaac gym preview on my local RTX server. For launching real experiements on our ‘non ubuntu’ clusters, i had to use docker + singularity and it works like a charm.
Now that we are painfully switching to IsaacGym + orbit, I naively thought I could do the same but it gets more complicated or actually so far it does not work.

Do you confirm it is possible to run RL trainings on non RTX GPU servers via docker (and thus singularity) WHITOUT having to install anything on the target nodes? If so, can it be as simple as I was doing in the Isaac Gym old times?
Also, do I really need isaac sim + orbit to run simple RL trainings? That might help if I could access a simpler setup maybe.

So far I created an Isaac Sim + orbit container that I try to run directly on an accessible non RTX GPU server, hence without the singularity layer yet. And I get some drivers errors (see at the end of this message).

Any help/advice/feedback would be greatly appreciated.

Regards,

–Mike

Errors:
[INFO] Using python from: /orbit/_isaac_sim/python.sh
[Warning] [omni.isaac.kit.simulation_app] Modules: [‘omni.isaac.kit.app_framework’] were loaded before SimulationApp was started and might not be loaded correctly.
[Warning] [omni.isaac.kit.simulation_app] Please check to make sure no extra omniverse or pxr modules are imported before the call to SimulationApp(…)
Starting kit application with the following args: [‘/orbit/_isaac_sim/exts/omni.isaac.kit/omni/isaac/kit/simulation_app.py’, ‘/orbit/_isaac_sim/apps/omni.isaac.sim.python.kit’, ‘–/app/tokens/exe-path=/orbit/_isaac_sim/kit’, ‘–/persistent/app/viewport/displayOptions=3094’, ‘–/rtx/materialDb/syncLoads=True’, ‘–/rtx/hydra/materialSyncLoads=True–/omni.kit.plugin/syncUsdLoads=True’, ‘–/app/renderer/resolution/width=1280’, ‘–/app/renderer/resolution/height=720’, ‘–/app/window/width=1440’, ‘–/app/window/height=900’, ‘–/renderer/multiGpu/enabled=False’, ‘–/app/fastShutdown=True’, ‘–ext-folder’, ‘/orbit/_isaac_sim/exts’, ‘–ext-folder’, ‘/orbit/_isaac_sim/apps’, ‘–/physics/cudaDevice=0’, ‘–portable’, ‘–no-window’, ‘–allow-root’]
Passing the following args to the base kit application:
[Warning] [omni.kit.app.plugin] No crash reporter present, dumps uploading isn’t available.
[Info] [carb] Logging to file: /isaac-sim/kit/logs/Kit/Isaac-Sim/2022.2/kit_20231012_092309.log
2023-10-12 09:23:09 [40ms] [Warning] [omni.ext.plugin] [ext: omni.sensors.nv.lidar] Extensions config ‘extension.toml’ doesn’t exist ‘/isaac-sim/exts/omni.sensors.nv.lidar’ or ‘/isaac-sim/exts/omni.sensors.nv.lidar/config’
2023-10-12 09:23:09 [41ms] [Warning] [omni.ext.plugin] [ext: omni.sensors.nv.radar] Extensions config ‘extension.toml’ doesn’t exist ‘/isaac-sim/exts/omni.sensors.nv.radar’ or ‘/isaac-sim/exts/omni.sensors.nv.radar/config’
[0.301s] [ext: omni.stats-0.0.0] startup
[0.333s] [ext: omni.rtx.shadercache-1.0.0] startup
[0.345s] [ext: omni.assets.plugins-0.0.0] startup
[0.347s] [ext: omni.gpu_foundation-0.0.0] startup
[0.358s] [ext: carb.windowing.plugins-1.0.0] startup
2023-10-12 09:23:09 [338ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:09 [338ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[0.360s] [ext: omni.kit.renderer.init-0.0.0] startup
2023-10-12 09:23:09 [428ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:09 [428ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:09 [429ms] [Error] [carb.glinterop.plugin] GLInteropContext::init: carb::windowing is not available
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB

|---------------------------------------------------------------------------------------------|
| Driver Version: 470.129.06 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
|---------------------------------------------------------------------------------------------|
| 0 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | faf93056… |
|---------------------------------------------------------------------------------------------|
| 1 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | a27ccd9f… |
|---------------------------------------------------------------------------------------------|
| 2 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | 3f8b1bf9… |
|---------------------------------------------------------------------------------------------|
| 3 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | 7291a297… |
|=============================================================================================|
| OS: Linux f65840ffa526, Version: 3.10.0-1160.71.1.el7.x86_64
| Processor: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | Cores: Unknown | Logical: 32
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 515724 | Free Memory: 493726
| Total Page/Swap (MB): 65597 | Free Page/Swap: 65597
|---------------------------------------------------------------------------------------------|
2023-10-12 09:23:09 [429ms] [Error] [gpu.foundation.plugin] No device could be created. Some known system issues:

  • The driver is not installed properly and requires a clean re-install.
  • Your GPUs do not support RayTracing: DXR or Vulkan ray_tracing, or hardware is excluded due to performance.
  • The driver cannot enumerate any GPU: driver, display, TCC mode or a docker issue. For Vulkan, test it with Vulkaninfo tool from Vulkan SDK, instead of nvidia-smi.
  • For Ubuntu, it requires server-xorg-core 1.20.7+ and a display to work without --no-window.
  • For Linux dockers, the setup is not complete. Install the latest driver, xServer and NVIDIA container runtime.

2023-10-12 09:23:09 [437ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:09 [437ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:09 [437ms] [Error] [carb.glinterop.plugin] GLInteropContext::init: carb::windowing is not available
2023-10-12 09:23:09 [438ms] [Warning] [carb.graphics-vulkan.plugin] No command queue family supports flags: 0x100, queue type: 3. No queues of this type will be created

|---------------------------------------------------------------------------------------------|
| Driver Version: 470.129.06 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
|---------------------------------------------------------------------------------------------|
| 0 | Tesla P100-SXM2-16GB | Yes: 0 | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | faf93056… |
|---------------------------------------------------------------------------------------------|
| 1 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | a27ccd9f… |
|---------------------------------------------------------------------------------------------|
| 2 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | 3f8b1bf9… |
|---------------------------------------------------------------------------------------------|
| 3 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | 7291a297… |
|=============================================================================================|
| OS: Linux f65840ffa526, Version: 3.10.0-1160.71.1.el7.x86_64
| Processor: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | Cores: Unknown | Logical: 32
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 515724 | Free Memory: 493725
| Total Page/Swap (MB): 65597 | Free Page/Swap: 65597
|---------------------------------------------------------------------------------------------|
2023-10-12 09:23:10 [580ms] [Warning] [omni.gpu_foundation_factory.plugin] RT-capable GPU not found, switching to compatibility mode
[0.616s] [ext: omni.kit.pipapi-0.0.0] startup
[0.632s] [ext: omni.kit.pip_archive-0.0.0] startup
[0.636s] [ext: omni.kit.loop-isaac-1.0.0] startup
[0.638s] [ext: omni.kit.async_engine-0.0.0] startup
[0.641s] [ext: omni.kit.test-0.0.0] startup
[0.824s] [ext: omni.usd.config-1.0.0] startup
[0.834s] [ext: omni.usd.libs-1.0.0] startup
[1.012s] [ext: omni.isaac.core_archive-2.0.1] startup
[1.034s] [ext: omni.pip.torch-1_13_1-0.1.4] startup
[1.093s] [ext: omni.isaac.ml_archive-1.1.0] startup
[1.094s] [ext: omni.client-0.1.1] startup
[1.122s] [ext: omni.appwindow-1.0.1] startup
2023-10-12 09:23:10 [1,103ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,103ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[1.129s] [ext: omni.kit.renderer.core-0.0.0] startup
2023-10-12 09:23:10 [1,112ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,112ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:10 [1,114ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,114ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[1.151s] [ext: omni.kit.renderer.capture-0.0.0] startup
[1.159s] [ext: omni.kit.renderer.imgui-0.0.0] startup
2023-10-12 09:23:10 [1,144ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,144ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:10 [1,145ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,145ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[…]
[4.983s] [ext: omni.warp-0.6.3] startup
Warp 0.6.3 initialized:
CUDA Toolkit: 11.5, Driver: 11.4
Devices:
“cpu” | x86_64
“cuda:0” | Tesla P100-SXM2-16GB (sm_60)
“cuda:1” | Tesla P100-SXM2-16GB (sm_60)
“cuda:2” | Tesla P100-SXM2-16GB (sm_60)
“cuda:3” | Tesla P100-SXM2-16GB (sm_60)
Kernel cache: /root/.cache/warp/0.6.3
2023-10-12 09:23:18 [9,368ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:18 [9,368ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:18 [9,370ms] [Warning] [carb] [Plugin: libomni.structuredlog.plugin.so] Module /isaac-sim/kit/libomni.structuredlog.plugin.so remained loaded after unload request
2023-10-12 09:23:18 [9,372ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:18 [9,372ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[9.393s] [ext: omni.replicator.composer-1.2.10] startup
[9.416s] [ext: omni.replicator.isaac-1.7.4] startup
[9.497s] [ext: omni.resourcemonitor-1.0.0] startup
[9.506s] [ext: omni.rtx.settings.core-0.5.8] startup
[9.523s] [ext: omni.isaac.franka-0.4.0] startup
[9.526s] [ext: omni.kit.viewport.rtx-104.0.0] startup
[9.527s] [ext: semantics.schema.editor-0.3.3] startup
[9.537s] [ext: omni.kit.widget.live-2.0.3] startup
2023-10-12 09:23:18 [9,522ms] [Warning] [omni.kit.widget.live.cache_state_menu] Unable to detect Omniverse Cache Server. Consider installing it for better IO performance.
[9.544s] [ext: omni.isaac.utils-0.2.4] startup
[9.552s] [ext: omni.isaac.kit-1.4.1] startup
[9.552s] [ext: omni.isaac.cortex-0.3.2] startup
[9.554s] [ext: omni.kit.window.stats-0.1.2] startup
[9.557s] [ext: omni.isaac.sim.python-2022.2.1] startup
[9.559s] Simulation App Starting
2023-10-12 09:23:19 [9,696ms] [Warning] [rtx.neuraylib.plugin] [CUDA:RENDER] 0.1 CUDA rend warn : CUDA module initialization failed.
2023-10-12 09:23:19 [9,696ms] [Warning] [rtx.neuraylib.plugin] [CUDA:RENDER] 0.1 CUDA rend warn : The version of your CUDA driver is 11.4, but 11.6 is the required minimum
2023-10-12 09:23:19 [9,696ms] [Warning] [rtx.neuraylib.plugin] [CUDA:RENDER] 0.1 CUDA rend warn : Please update your display driver (current version 470.129.6) (www.nvidia.com) to at least 510.73.05.
2023-10-12 09:23:19 [9,929ms] [Warning] [rtx.neuraylib.plugin] [IRAY:RENDER] 1.1 IRAY rend warn : Your NVIDIA driver supports CUDA version up to 11.4; iray photoreal requires CUDA version 11.6; iray photoreal can only run in CPU mode. Please update your NVIDIA driver (www.nvidia.com) to at least 510.73.05.
2023-10-12 09:23:19 [9,929ms] [Warning] [rtx.neuraylib.plugin] [IRAY:RENDER] 1.1 IRAY rend warn : There is no CUDA-capable GPU available to the iray photoreal renderer.
[13.731s] [ext: omni.isaac.sim.python-2022.2.1] shutdown
[13.731s] [ext: omni.isaac.cortex-0.3.2] shutdown
[13.931s] [ext: omni.isaac.franka-0.4.0] shutdown
[14.125s] [ext: omni.isaac.universal_robots-0.3.2] shutdown
[14.319s] [ext: omni.isaac.dofbot-0.3.0] shutdown
[14.513s] [ext: omni.isaac.manipulators-1.1.0] shutdown
[14.718s] [ext: omni.isaac.surface_gripper-0.4.0] shutdown
2023-10-12 09:23:24 [14,991ms] [Warning] [omni.ext._impl._internal] omni.isaac.surface_gripper-0.4.0 → <class ‘omni.isaac.surface_gripper.scripts.extension.Extension’>: extension object is still alive, something holds a reference on it. References: [“[0]:type: <class ‘method’>, id: 140326755711088”]
[15.207s] [ext: omni.isaac.wheeled_robots-0.6.3] shutdown
[15.414s] [ext: omni.isaac.surface_gripper-0.4.0] startup
[15.429s] [ext: omni.isaac.manipulators-1.1.0] startup
[15.432s] [ext: omni.isaac.dofbot-0.3.0] startup
[15.433s] [ext: omni.isaac.universal_robots-0.3.2] startup
[15.434s] [ext: omni.isaac.wheeled_robots-0.6.3] startup
[15.443s] [ext: omni.isaac.franka-0.4.0] startup
[15.444s] [ext: omni.isaac.cortex-0.3.2] startup
[15.445s] [ext: omni.isaac.sim.python-2022.2.1] startup
2023-10-12 09:23:26 [16,804ms] [Warning] [omni.hydra.rtx] HydraEngine rtx failed creating scene renderer.
[16.885s] app ready

Yes, probably.
Although there are others saying it’s possible using streaming on A100.
But it’s quite unclear to me so far.
Also, the fact that it was possible with Isaac Gym still makes me hope it is still the case but I’m less and less confident actually :-(

Hi @mike.niemaz - Yes, you can run Isaac Gym RL training on a cluster environment using a Docker container, but with certain conditions. First, the Docker container should have all the dependencies that are required for the RL training, you shouldn’t have to install anything on the target nodes. That’s one of the main advantages of using Docker.

About the GPU issue, as of now, Isaac Sim and Isaac Gym require NVIDIA graphics cards with GPU hardware based on the rendering and compute-focused Turing or Ampere architectures. This includes both RTX and A-series (previously Tesla) cards. Your Tesla P100 GPUs are based on the older Pascal architecture. Therefore, the errors you are seeing are due to this limitation.

Moreover, IsaacGym has a hard dependency on the RTX renderer, so you can’t remove Isaac Sim + Orbit to run RL training. You have the option of avoiding RTX GPU feature usage, but IsaacSim + orbit package should still be installed.

For the Docker container’s NVIDIA driver issues, make sure that the NVIDIA driver version inside the Docker container matches the NVIDIA driver version on the host machine. Also make sure that the Docker runtime used is NVIDIA’s runtime nvidia-docker so that the GPUs on the host machine can be detected and used by the Docker container.

Regarding the GLFW initialization failure, this is actually expected since Isaac Sim’s current version doesn’t support headless operation, meaning a viable display system is required even when running without window. If you’re running it on a server, a workaround would be to use a virtual frame buffer like xvfb to trick the system into thinking it has a display.

It seems like to resolve this issue, you would need to get access to servers with Turing or Ampere architectures GPUs. If using servers with non-RTX GPUs is a hard requirement, unfortunately, you might need to find a different RL environment as Isaac Sim + Isaac Gym currently doesn’t support non-RTX GPUs.

2 Likes

Thank you for your detailed answer!
I understand there might be a chance to make my docker container run on A100 (Ampere architectures) cluster. Is this correct? That would be an awesome news!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.