Hi,
I’m trying to get the OmniIsaacGymEnvs to work in a docker container. I have an internal GPU and an eGPU and the only way I could get Isaac to use the eGPU was to use a docker container. I seem to be able to train a headless policy, but I get a segmentation fault when I try running it without a headless arg.
‘’’
root@user:/workspace/omniisaacgymenvs/omniisaacgymenvs# /isaac-sim/python.sh scripts/rlgames_train.py task=Ant
/isaac-sim/extscache/omni.pip.torch-2_0_1-2.0.2+105.1.lx64/torch-2-0-1/torch/utils/tensorboard/init.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if not hasattr(tensorboard, “version”) or LooseVersion(
Starting kit application with the following args: [‘/isaac-sim/exts/omni.isaac.kit/omni/isaac/kit/simulation_app.py’, ‘/isaac-sim/apps/omni.isaac.sim.python.gym.kit’, ‘–/app/tokens/exe-path=/isaac-sim/kit’, ‘–/persistent/app/viewport/displayOptions=3094’, ‘–/rtx/materialDb/syncLoads=True’, ‘–/rtx/hydra/materialSyncLoads=True’, ‘–/omni.kit.plugin/syncUsdLoads=True’, ‘–/app/renderer/resolution/width=1280’, ‘–/app/renderer/resolution/height=720’, ‘–/app/window/width=1440’, ‘–/app/window/height=900’, ‘–/renderer/multiGpu/enabled=True’, ‘–/app/fastShutdown=True’, ‘–ext-folder’, ‘/isaac-sim/exts’, ‘–ext-folder’, ‘/isaac-sim/apps’, ‘–/physics/cudaDevice=0’, ‘–portable’, ‘–allow-root’]
Passing the following args to the base kit application: [‘task=Ant’]
[Info] [carb] Logging to file: /isaac-sim/kit/logs/Kit/omni.isaac.sim.python.gym/2023.1/kit_20231201_112240.log
2023-12-01 11:22:40 [0ms] [Warning] [omni.kit.app.plugin] No crash reporter present, dumps uploading isn’t available.
2023-12-01 11:22:40 [9ms] [Warning] [omni.ext.plugin] [ext: omni.kit.converter.cad-200.0.0-rc.4+105.0] Built using kit version: 105.0. Current version: 105.1. It is considered compatible, but building with a newer version is recommended.
2023-12-01 11:22:40 [9ms] [Warning] [omni.ext.plugin] [ext: omni.kit.converter.cad_core-200.0.0-rc.3+105.0.lx64.r.cp310] Built using kit version: 105.0. Current version: 105.1. It is considered compatible, but building with a newer version is recommended.
2023-12-01 11:22:40 [10ms] [Warning] [omni.ext.plugin] [ext: omni.kit.sequencer.core-103.4.1+105.0] Built using kit version: 105.0. Current version: 105.1. It is considered compatible, but building with a newer version is recommended.
2023-12-01 11:22:40 [10ms] [Warning] [omni.ext.plugin] [ext: omni.kit.sequencer.usd-103.4.2+105.0] Built using kit version: 105.0. Current version: 105.1. It is considered compatible, but building with a newer version is recommended.
2023-12-01 11:22:40 [10ms] [Warning] [omni.ext.plugin] [ext: omni.kit.widget.timeline-105.0.1+105.0] Built using kit version: 105.0. Current version: 105.1. It is considered compatible, but building with a newer version is recommended.
2023-12-01 11:22:40 [11ms] [Warning] [omni.ext.plugin] [ext: omni.kit.window.sequencer-103.4.2-dev.3+105.0] Built using kit version: 105.0. Current version: 105.1. It is considered compatible, but building with a newer version is recommended.
2023-12-01 11:22:40 [11ms] [Warning] [omni.ext.plugin] [ext: omni.paint.brush.attributes-1.3.1+105.0] Built using kit version: 105.0. Current version: 105.1. It is considered compatible, but building with a newer version is recommended.
2023-12-01 11:22:40 [13ms] [Warning] [omni.ext.plugin] [ext: omni.usd.schema.sequence-2.3.0+105.0.lx64.r.cp310] Built using kit version: 105.0. Current version: 105.1. It is considered compatible, but building with a newer version is recommended.
[0.042s] [ext: omni.kit.async_engine-0.0.0] startup
[0.387s] [ext: omni.assets.plugins-0.0.0] startup
[0.389s] [ext: omni.stats-0.0.0] startup
[0.390s] [ext: omni.client-1.0.1] startup
[0.396s] [ext: omni.gpu_foundation-0.0.0] startup
[0.403s] [ext: omni.rtx.shadercache.vulkan-1.0.0] startup
[0.405s] [ext: carb.windowing.plugins-1.0.0] startup
2023-12-01 11:22:40 [397ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-12-01 11:22:40 [397ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
[0.406s] [ext: omni.kit.renderer.init-0.0.0] startup
2023-12-01 11:22:40 [431ms] [Warning] [omni.platforminfo.plugin] failed to open the default display. Can’t verify X Server version.
2023-12-01 11:22:41 [1,222ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-12-01 11:22:41 [1,222ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
|---------------------------------------------------------------------------------------------|
| Driver Version: 525.147.05 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
| | | | | | Bus-ID | |
|---------------------------------------------------------------------------------------------|
| 0 | NVIDIA GeForce RTX 3060 | Yes: 0 | | 12534 MB | 10de | 0 |
| | | | | | 2504 | 8dc4d4d3… |
| | | | | | a | |
|=============================================================================================|
| OS: 22.04.3 LTS (Jammy Jellyfish) ubuntu, Version: 22.04.3, Kernel: 5.15.0-89-generic
| Processor: Intel(R) Core™ i7-10875H CPU @ 2.30GHz | Cores: 8 | Logical: 16
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 31724 | Free Memory: 25759
| Total Page/Swap (MB): 2047 | Free Page/Swap: 2047
|---------------------------------------------------------------------------------------------|
[1.745s] [ext: omni.kit.pipapi-0.0.0] startup
[1.747s] [ext: omni.kit.pip_archive-0.0.0] startup
[1.747s] [ext: omni.pip.compute-1.2.0] startup
[1.748s] [ext: omni.pip.torch-2_0_1-2.0.2] startup
[1.773s] [ext: omni.pip.cloud-1.0.1] startup
[1.775s] [ext: omni.isaac.core_archive-2.2.1] startup
[1.775s] [ext: omni.kit.telemetry-0.5.0] startup
[1.793s] [ext: omni.isaac.ml_archive-1.1.3] startup
[1.793s] [ext: omni.mtlx-0.1.0] startup
[1.794s] [ext: omni.usd.config-1.0.3] startup
[1.798s] [ext: omni.gpucompute.plugins-0.0.0] startup
[1.799s] [ext: omni.usd.libs-1.0.0] startup
[1.916s] [ext: omni.kit.loop-isaac-1.1.0] startup
[1.917s] [ext: omni.kit.test-0.0.0] startup
[1.918s] [ext: omni.usd.schema.omnigraph-1.0.0] startup
[2.034s] [ext: omni.usd.schema.physics-0.0.0] startup
[2.065s] [ext: omni.usd.schema.audio-0.0.0] startup
[2.069s] [ext: omni.usd.schema.physx-0.0.0] startup
[2.097s] [ext: omni.usd.schema.semantics-0.0.0] startup
[2.106s] [ext: omni.usd.schema.anim-0.0.0] startup
[2.125s] [ext: omni.usd.schema.omniscripting-1.0.0] startup
[2.131s] [ext: omni.usd.schema.geospatial-0.0.0] startup
[2.134s] [ext: omni.appwindow-1.1.5] startup
2023-12-01 11:22:42 [2,126ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-12-01 11:22:42 [2,126ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
2023-12-01 11:22:42 [2,126ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-12-01 11:22:42 [2,126ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
2023-12-01 11:22:42 [2,127ms] [Error] [omni.appwindow.plugin] Failed to acquire IWindowing interface
[2.137s] [ext: omni.kit.renderer.core-0.0.0] startup
Fatal Python error: Segmentation fault
Current thread 0x00007f0a2493fb80 (most recent call first):
File “/isaac-sim/exts/omni.isaac.kit/omni/isaac/kit/simulation_app.py”, line 303 in _start_app
File “/isaac-sim/exts/omni.isaac.kit/omni/isaac/kit/simulation_app.py”, line 192 in init
File “/isaac-sim/exts/omni.isaac.gym/omni/isaac/gym/vec_env/vec_env_base.py”, line 56 in init
File “/workspace/omniisaacgymenvs/omniisaacgymenvs/scripts/rlgames_train.py”, line 98 in parse_hydra_configs
File “/isaac-sim/kit/python/lib/python3.10/site-packages/hydra/core/utils.py”, line 186 in run_job
File “/isaac-sim/kit/python/lib/python3.10/site-packages/hydra/_internal/hydra.py”, line 119 in run
File “/isaac-sim/kit/python/lib/python3.10/site-packages/hydra/_internal/utils.py”, line 458 in
File “/isaac-sim/kit/python/lib/python3.10/site-packages/hydra/_internal/utils.py”, line 220 in run_and_report
File “/isaac-sim/kit/python/lib/python3.10/site-packages/hydra/_internal/utils.py”, line 457 in _run_app
File “/isaac-sim/kit/python/lib/python3.10/site-packages/hydra/_internal/utils.py”, line 394 in _run_hydra
File “/isaac-sim/kit/python/lib/python3.10/site-packages/hydra/main.py”, line 94 in decorated_main
File “/workspace/omniisaacgymenvs/omniisaacgymenvs/scripts/rlgames_train.py”, line 150 in
Extension modules: yaml._yaml, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, google.protobuf.pyext._message, psutil._psutil_linux, psutil._psutil_posix (total: 24)
/isaac-sim/python.sh: line 41: 407 Segmentation fault (core dumped) $python_exe “$@” $args
There was an error running python
root@user:/workspace/omniisaacgymenvs/omniisaacgymenvs#
‘’’
Running nvidia-smi inside the container returns the following:
‘’’
Fri Dec 1 11:25:35 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:0A:00.0 Off | N/A |
| 48% 41C P0 36W / 170W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
‘’’
Do you have any ideas on how to fix this?
Edit: I added -v $HOME/.Xauthority:/root/.Xauthority \
-e DISPLAY \
to the docker run and tried running it on the internal GPU and it seems to work (runs out of memory due to the GPU). However, when running it on the eGPU I get the following:
2023-12-01 11:40:18 [455ms] [Error] [carb.graphics-vulkan.plugin] VkResult: ERROR_INITIALIZATION_FAILED
2023-12-01 11:40:18 [455ms] [Error] [carb.graphics-vulkan.plugin] vkEnumeratePhysicalDevices failed. No physical device is found.
2023-12-01 11:40:18 [455ms] [Error] [carb.graphics-vulkan.plugin] No physical device is found.
2023-12-01 11:40:18 [456ms] [Error] [gpu.foundation.plugin] carb::graphics::createInstance failed.
2023-12-01 11:40:19 [1,396ms] [Error] [carb.graphics-vulkan.plugin] VkResult: ERROR_INITIALIZATION_FAILED
2023-12-01 11:40:19 [1,396ms] [Error] [carb.graphics-vulkan.plugin] vkEnumeratePhysicalDevices failed. No physical device is found.
2023-12-01 11:40:19 [1,396ms] [Error] [carb.graphics-vulkan.plugin] No physical device is found.
2023-12-01 11:40:19 [1,397ms] [Error] [gpu.foundation.plugin] carb::graphics::createInstance failed.
2023-12-01 11:40:20 [2,308ms] [Error] [omni.gpu_foundation_factory.plugin] Failed to create GPU foundation devices for compatibilityMode!