Sorry for my naive questions but i’m really getting confused and need your help pls.
Previously, I used to implement/debug/test code RL algorithms using isaac gym preview on my local RTX server. For launching real experiements on our ‘non ubuntu’ clusters, i had to use docker + singularity and it works like a charm.
Now that we are painfully switching to IsaacGym + orbit, I naively thought I could do the same but it gets more complicated or actually so far it does not work.
Do you confirm it is possible to run RL trainings on non RTX GPU servers via docker (and thus singularity) WHITOUT having to install anything on the target nodes? If so, can it be as simple as I was doing in the Isaac Gym old times?
Also, do I really need isaac sim + orbit to run simple RL trainings? That might help if I could access a simpler setup maybe.
So far I created an Isaac Sim + orbit container that I try to run directly on an accessible non RTX GPU server, hence without the singularity layer yet. And I get some drivers errors (see at the end of this message).
Any help/advice/feedback would be greatly appreciated.
Regards,
–Mike
Errors:
[INFO] Using python from: /orbit/_isaac_sim/python.sh
[Warning] [omni.isaac.kit.simulation_app] Modules: [‘omni.isaac.kit.app_framework’] were loaded before SimulationApp was started and might not be loaded correctly.
[Warning] [omni.isaac.kit.simulation_app] Please check to make sure no extra omniverse or pxr modules are imported before the call to SimulationApp(…)
Starting kit application with the following args: [‘/orbit/_isaac_sim/exts/omni.isaac.kit/omni/isaac/kit/simulation_app.py’, ‘/orbit/_isaac_sim/apps/omni.isaac.sim.python.kit’, ‘–/app/tokens/exe-path=/orbit/_isaac_sim/kit’, ‘–/persistent/app/viewport/displayOptions=3094’, ‘–/rtx/materialDb/syncLoads=True’, ‘–/rtx/hydra/materialSyncLoads=True–/omni.kit.plugin/syncUsdLoads=True’, ‘–/app/renderer/resolution/width=1280’, ‘–/app/renderer/resolution/height=720’, ‘–/app/window/width=1440’, ‘–/app/window/height=900’, ‘–/renderer/multiGpu/enabled=False’, ‘–/app/fastShutdown=True’, ‘–ext-folder’, ‘/orbit/_isaac_sim/exts’, ‘–ext-folder’, ‘/orbit/_isaac_sim/apps’, ‘–/physics/cudaDevice=0’, ‘–portable’, ‘–no-window’, ‘–allow-root’]
Passing the following args to the base kit application:
[Warning] [omni.kit.app.plugin] No crash reporter present, dumps uploading isn’t available.
[Info] [carb] Logging to file: /isaac-sim/kit/logs/Kit/Isaac-Sim/2022.2/kit_20231012_092309.log
2023-10-12 09:23:09 [40ms] [Warning] [omni.ext.plugin] [ext: omni.sensors.nv.lidar] Extensions config ‘extension.toml’ doesn’t exist ‘/isaac-sim/exts/omni.sensors.nv.lidar’ or ‘/isaac-sim/exts/omni.sensors.nv.lidar/config’
2023-10-12 09:23:09 [41ms] [Warning] [omni.ext.plugin] [ext: omni.sensors.nv.radar] Extensions config ‘extension.toml’ doesn’t exist ‘/isaac-sim/exts/omni.sensors.nv.radar’ or ‘/isaac-sim/exts/omni.sensors.nv.radar/config’
[0.301s] [ext: omni.stats-0.0.0] startup
[0.333s] [ext: omni.rtx.shadercache-1.0.0] startup
[0.345s] [ext: omni.assets.plugins-0.0.0] startup
[0.347s] [ext: omni.gpu_foundation-0.0.0] startup
[0.358s] [ext: carb.windowing.plugins-1.0.0] startup
2023-10-12 09:23:09 [338ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:09 [338ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[0.360s] [ext: omni.kit.renderer.init-0.0.0] startup
2023-10-12 09:23:09 [428ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:09 [428ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:09 [429ms] [Error] [carb.glinterop.plugin] GLInteropContext::init: carb::windowing is not available
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
2023-10-12 09:23:09 [429ms] [Warning] [gpu.foundation.plugin] Skipping unsupported non-RTX GPU: Tesla P100-SXM2-16GB
|---------------------------------------------------------------------------------------------|
| Driver Version: 470.129.06 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
|---------------------------------------------------------------------------------------------|
| 0 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | faf93056… |
|---------------------------------------------------------------------------------------------|
| 1 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | a27ccd9f… |
|---------------------------------------------------------------------------------------------|
| 2 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | 3f8b1bf9… |
|---------------------------------------------------------------------------------------------|
| 3 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | 7291a297… |
|=============================================================================================|
| OS: Linux f65840ffa526, Version: 3.10.0-1160.71.1.el7.x86_64
| Processor: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | Cores: Unknown | Logical: 32
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 515724 | Free Memory: 493726
| Total Page/Swap (MB): 65597 | Free Page/Swap: 65597
|---------------------------------------------------------------------------------------------|
2023-10-12 09:23:09 [429ms] [Error] [gpu.foundation.plugin] No device could be created. Some known system issues:
- The driver is not installed properly and requires a clean re-install.
- Your GPUs do not support RayTracing: DXR or Vulkan ray_tracing, or hardware is excluded due to performance.
- The driver cannot enumerate any GPU: driver, display, TCC mode or a docker issue. For Vulkan, test it with Vulkaninfo tool from Vulkan SDK, instead of nvidia-smi.
- For Ubuntu, it requires server-xorg-core 1.20.7+ and a display to work without --no-window.
- For Linux dockers, the setup is not complete. Install the latest driver, xServer and NVIDIA container runtime.
2023-10-12 09:23:09 [437ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:09 [437ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:09 [437ms] [Error] [carb.glinterop.plugin] GLInteropContext::init: carb::windowing is not available
2023-10-12 09:23:09 [438ms] [Warning] [carb.graphics-vulkan.plugin] No command queue family supports flags: 0x100, queue type: 3. No queues of this type will be created
|---------------------------------------------------------------------------------------------|
| Driver Version: 470.129.06 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
|---------------------------------------------------------------------------------------------|
| 0 | Tesla P100-SXM2-16GB | Yes: 0 | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | faf93056… |
|---------------------------------------------------------------------------------------------|
| 1 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | a27ccd9f… |
|---------------------------------------------------------------------------------------------|
| 2 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | 3f8b1bf9… |
|---------------------------------------------------------------------------------------------|
| 3 | Tesla P100-SXM2-16GB | | | 16384 MB | 10de | 0 |
| | | | | | 15f9 | 7291a297… |
|=============================================================================================|
| OS: Linux f65840ffa526, Version: 3.10.0-1160.71.1.el7.x86_64
| Processor: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | Cores: Unknown | Logical: 32
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 515724 | Free Memory: 493725
| Total Page/Swap (MB): 65597 | Free Page/Swap: 65597
|---------------------------------------------------------------------------------------------|
2023-10-12 09:23:10 [580ms] [Warning] [omni.gpu_foundation_factory.plugin] RT-capable GPU not found, switching to compatibility mode
[0.616s] [ext: omni.kit.pipapi-0.0.0] startup
[0.632s] [ext: omni.kit.pip_archive-0.0.0] startup
[0.636s] [ext: omni.kit.loop-isaac-1.0.0] startup
[0.638s] [ext: omni.kit.async_engine-0.0.0] startup
[0.641s] [ext: omni.kit.test-0.0.0] startup
[0.824s] [ext: omni.usd.config-1.0.0] startup
[0.834s] [ext: omni.usd.libs-1.0.0] startup
[1.012s] [ext: omni.isaac.core_archive-2.0.1] startup
[1.034s] [ext: omni.pip.torch-1_13_1-0.1.4] startup
[1.093s] [ext: omni.isaac.ml_archive-1.1.0] startup
[1.094s] [ext: omni.client-0.1.1] startup
[1.122s] [ext: omni.appwindow-1.0.1] startup
2023-10-12 09:23:10 [1,103ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,103ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[1.129s] [ext: omni.kit.renderer.core-0.0.0] startup
2023-10-12 09:23:10 [1,112ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,112ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:10 [1,114ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,114ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[1.151s] [ext: omni.kit.renderer.capture-0.0.0] startup
[1.159s] [ext: omni.kit.renderer.imgui-0.0.0] startup
2023-10-12 09:23:10 [1,144ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,144ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:10 [1,145ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:10 [1,145ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[…]
[4.983s] [ext: omni.warp-0.6.3] startup
Warp 0.6.3 initialized:
CUDA Toolkit: 11.5, Driver: 11.4
Devices:
“cpu” | x86_64
“cuda:0” | Tesla P100-SXM2-16GB (sm_60)
“cuda:1” | Tesla P100-SXM2-16GB (sm_60)
“cuda:2” | Tesla P100-SXM2-16GB (sm_60)
“cuda:3” | Tesla P100-SXM2-16GB (sm_60)
Kernel cache: /root/.cache/warp/0.6.3
2023-10-12 09:23:18 [9,368ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:18 [9,368ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-12 09:23:18 [9,370ms] [Warning] [carb] [Plugin: libomni.structuredlog.plugin.so] Module /isaac-sim/kit/libomni.structuredlog.plugin.so remained loaded after unload request
2023-10-12 09:23:18 [9,372ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-12 09:23:18 [9,372ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[9.393s] [ext: omni.replicator.composer-1.2.10] startup
[9.416s] [ext: omni.replicator.isaac-1.7.4] startup
[9.497s] [ext: omni.resourcemonitor-1.0.0] startup
[9.506s] [ext: omni.rtx.settings.core-0.5.8] startup
[9.523s] [ext: omni.isaac.franka-0.4.0] startup
[9.526s] [ext: omni.kit.viewport.rtx-104.0.0] startup
[9.527s] [ext: semantics.schema.editor-0.3.3] startup
[9.537s] [ext: omni.kit.widget.live-2.0.3] startup
2023-10-12 09:23:18 [9,522ms] [Warning] [omni.kit.widget.live.cache_state_menu] Unable to detect Omniverse Cache Server. Consider installing it for better IO performance.
[9.544s] [ext: omni.isaac.utils-0.2.4] startup
[9.552s] [ext: omni.isaac.kit-1.4.1] startup
[9.552s] [ext: omni.isaac.cortex-0.3.2] startup
[9.554s] [ext: omni.kit.window.stats-0.1.2] startup
[9.557s] [ext: omni.isaac.sim.python-2022.2.1] startup
[9.559s] Simulation App Starting
2023-10-12 09:23:19 [9,696ms] [Warning] [rtx.neuraylib.plugin] [CUDA:RENDER] 0.1 CUDA rend warn : CUDA module initialization failed.
2023-10-12 09:23:19 [9,696ms] [Warning] [rtx.neuraylib.plugin] [CUDA:RENDER] 0.1 CUDA rend warn : The version of your CUDA driver is 11.4, but 11.6 is the required minimum
2023-10-12 09:23:19 [9,696ms] [Warning] [rtx.neuraylib.plugin] [CUDA:RENDER] 0.1 CUDA rend warn : Please update your display driver (current version 470.129.6) (www.nvidia.com) to at least 510.73.05.
2023-10-12 09:23:19 [9,929ms] [Warning] [rtx.neuraylib.plugin] [IRAY:RENDER] 1.1 IRAY rend warn : Your NVIDIA driver supports CUDA version up to 11.4; iray photoreal requires CUDA version 11.6; iray photoreal can only run in CPU mode. Please update your NVIDIA driver (www.nvidia.com) to at least 510.73.05.
2023-10-12 09:23:19 [9,929ms] [Warning] [rtx.neuraylib.plugin] [IRAY:RENDER] 1.1 IRAY rend warn : There is no CUDA-capable GPU available to the iray photoreal renderer.
[13.731s] [ext: omni.isaac.sim.python-2022.2.1] shutdown
[13.731s] [ext: omni.isaac.cortex-0.3.2] shutdown
[13.931s] [ext: omni.isaac.franka-0.4.0] shutdown
[14.125s] [ext: omni.isaac.universal_robots-0.3.2] shutdown
[14.319s] [ext: omni.isaac.dofbot-0.3.0] shutdown
[14.513s] [ext: omni.isaac.manipulators-1.1.0] shutdown
[14.718s] [ext: omni.isaac.surface_gripper-0.4.0] shutdown
2023-10-12 09:23:24 [14,991ms] [Warning] [omni.ext._impl._internal] omni.isaac.surface_gripper-0.4.0 → <class ‘omni.isaac.surface_gripper.scripts.extension.Extension’>: extension object is still alive, something holds a reference on it. References: [“[0]:type: <class ‘method’>, id: 140326755711088”]
[15.207s] [ext: omni.isaac.wheeled_robots-0.6.3] shutdown
[15.414s] [ext: omni.isaac.surface_gripper-0.4.0] startup
[15.429s] [ext: omni.isaac.manipulators-1.1.0] startup
[15.432s] [ext: omni.isaac.dofbot-0.3.0] startup
[15.433s] [ext: omni.isaac.universal_robots-0.3.2] startup
[15.434s] [ext: omni.isaac.wheeled_robots-0.6.3] startup
[15.443s] [ext: omni.isaac.franka-0.4.0] startup
[15.444s] [ext: omni.isaac.cortex-0.3.2] startup
[15.445s] [ext: omni.isaac.sim.python-2022.2.1] startup
2023-10-12 09:23:26 [16,804ms] [Warning] [omni.hydra.rtx] HydraEngine rtx failed creating scene renderer.
[16.885s] app ready