Isaac Sim Replicator crashes

Hi,

I’ve managed to get Isaac Sim running on a remote workstation with 8 A6000s. It mostly seems to work however I haven’t been able to use the replicator for anything because it keeps crashing.

As you can see here running the GUI is fine:

However when I try to run any replicator examples I get a segfault. For instance any of the commands from 5. Replicator Composer — Omniverse Robotics documentation

Produce:

[0.113s] [ext: omni.stats-0.0.0] startup
[0.138s] [ext: omni.gpu_foundation-0.0.0] startup
2022-06-17 21:21:40 [134ms] [Warning] [carb] FrameworkImpl::setDefaultPlugin(client: omni.gpu_foundation_factory.plugin, desc : [carb::graphics::Graphics v2.5], plugin : carb.graphics-vulkan.plugin) failed. Plugin selection is locked, because the interface was previously acquired by: 
[0.149s] [ext: carb.windowing.plugins-1.0.0] startup
[0.160s] [ext: omni.assets.plugins-0.0.0] startup
[0.162s] [ext: omni.kit.renderer.init-0.0.0] startup

|---------------------------------------------------------------------------------------------|
| Driver Version: 470.129.6     | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name                             | Active | LDA | GPU Memory | Vendor-ID | LUID       |
|     |                                  |        |     |            | Device-ID | UUID       |
|---------------------------------------------------------------------------------------------|
| 0   | NVIDIA RTX A6000                 | Yes: 0 |     | 49386   MB | 10de      | 0          |
|     |                                  |        |     |            | 2230      | d80cff8e.. |
|=============================================================================================|
| OS: Linux anton, Version: 5.13.0-48-generic
| XServer Vendor: The X.Org Foundation, XServer Version: 12014000 (1.20.14.0)
| Processor: AMD EPYC 7513 32-Core Processor                 | Cores: Unknown | Logical: 128
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 515816 | Free Memory: 424150
| Total Page/Swap (MB): 2047 | Free Page/Swap: 2047
|---------------------------------------------------------------------------------------------|
2022-06-17 21:21:41 [1,278ms] [Error] [carb.glinterop.plugin] OpenGL Interop is not available. Upgrade your driver to latest for this feature.
2022-06-17 21:21:41 [1,278ms] [Warning] [gpu.foundation.plugin] Realm: no OpenGL interop context.
2022-06-17 21:21:42 [2,440ms] [Warning] [carb.cudainterop.plugin] On Linux only, CUDA and the display driver does not support IOMMU-enabled bare-metal PCIe peer to peer memory copy.
However, CUDA and the display driver does support IOMMU via VM pass through. As a consequence, users on Linux,
when running on a native bare metal system, should disable the IOMMU. The IOMMU should be enabled and the VFIO driver
be used as a PCIe pass through for virtual machines.
[2.465s] [ext: omni.kit.pipapi-0.0.0] startup
[2.476s] [ext: omni.kit.pip_archive-0.0.0] startup
[2.487s] [ext: omni.isaac.core_archive-0.3.0] startup
[2.506s] [ext: omni.usd.config-1.0.0] startup
[2.508s] [ext: omni.usd.libs-1.0.0] startup
[2.642s] [ext: omni.isaac.ml_archive-0.1.0] startup
[2.706s] [ext: omni.kit.loop-isaac-0.1.0] startup
[2.707s] [ext: omni.kit.async_engine-0.0.0] startup
[2.708s] [ext: omni.appwindow-1.0.0] startup
[2.715s] [ext: omni.client-0.1.0] startup
[2.720s] [ext: omni.kit.test-0.0.0] startup
[2.761s] [ext: omni.kit.renderer.core-0.0.0] startup
[2.914s] [ext: omni.ui-2.10.3] startup
[2.931s] [ext: carb.audio-0.1.0] startup
[2.955s] [ext: omni.kit.mainwindow-0.0.0] startup
[2.957s] [ext: omni.uiaudio-1.0.0] startup
[2.958s] [ext: omni.kit.uiapp-0.0.0] startup
[2.958s] [ext: omni.usd.schema.physics-1.0.0] startup
[3.009s] [ext: omni.usd.schema.audio-0.0.0] startup
[3.016s] [ext: omni.usd.schema.semantics-0.0.0] startup
[3.028s] [ext: omni.usd.schema.omnigraph-1.0.0] startup
[3.040s] [ext: omni.usd.schema.anim-0.0.0] startup
[3.076s] [ext: omni.kit.commands-1.2.2] startup
[3.082s] [ext: omni.timeline-1.0.2] startup
[3.085s] [ext: omni.hydra.scene_delegate-0.2.0] startup
[3.090s] [ext: omni.kit.audiodeviceenum-1.0.0] startup
[3.091s] [ext: omni.usd-1.5.3] startup
[3.143s] [ext: omni.kit.asset_converter-1.2.30] startup
[3.164s] [ext: omni.usd.schema.isaac-0.2.0] startup
[3.204s] [ext: omni.usd.schema.physx-0.0.0] startup
[3.232s] [ext: omni.kit.search_core-1.0.2] startup
[3.233s] [ext: omni.renderer-rtx-0.0.0] startup
[3.235s] [ext: omni.kit.widget.graph-1.4.2] startup
[3.244s] [ext: omni.kit.widget.filebrowser-2.2.26] startup
[3.281s] [ext: omni.kit.window.popup_dialog-2.0.7] startup
[3.286s] [ext: omni.mdl.neuraylib-0.1.0] startup
[3.290s] [ext: omni.kit.widget.path_field-2.0.3] startup
[3.291s] [ext: omni.kit.widget.versioning-1.3.8] startup
[3.293s] [ext: omni.kit.notification_manager-1.0.5] startup
[3.299s] [ext: omni.kit.widget.browser_bar-2.0.3] startup
[3.301s] [ext: omni.kit.menu.utils-1.2.11] startup
[3.311s] [ext: omni.kit.window.filepicker-2.4.29] startup
OmniAssetFileFormat
[3.360s] [ext: omni.mdl-0.1.0] startup
[3.380s] [ext: omni.kit.menu.create-1.0.2] startup
[3.381s] [ext: omni.kit.window.file_exporter-1.0.4] startup
[3.382s] [ext: omni.kit.window.drop_support-1.0.0] startup
[3.383s] [ext: omni.kit.material.library-1.3.10] startup
[3.385s] [ext: omni.kit.window.property-1.6.3] startup
[3.387s] [ext: omni.kit.context_menu-1.3.9] startup
[3.391s] [ext: omni.kit.window.file_importer-1.0.4] startup
[3.391s] [ext: omni.kit.stage_templates-1.1.2] startup
[3.393s] [ext: omni.kit.widget.stage-2.6.15] startup
[3.397s] [ext: omni.kit.window.file-1.3.16] startup
[3.399s] [ext: omni.debugdraw-0.1.0] startup
[3.404s] [ext: omni.kit.window.content_browser-2.4.28] startup
[3.422s] [ext: omni.kit.widget.prompt-1.0.1] startup
[3.423s] [ext: omni.kit.property.usd-3.14.8] startup
[3.452s] [ext: omni.hydra.engine.stats-1.0.0] startup
[3.457s] [ext: omni.kit.widget.settings-1.0.0] startup
[3.458s] [ext: omni.graph.tools-1.3.5] startup
[3.481s] [ext: omni.graph.core-2.27.0] startup
[3.484s] [ext: omni.hydra.rtx-0.1.0] startup
[3.493s] [ext: omni.kit.viewport.legacy_gizmos-1.0.0] startup
[3.495s] [ext: omni.ui_query-1.1.1] startup
[3.497s] [ext: omni.graph-1.22.1] startup
[3.544s] [ext: omni.kit.window.viewport-0.0.0] startup
2022-06-17 21:21:44 [3,669ms] [Error] [rtx.neuraylib.plugin] [DYNLIB:IO]   0.1   DYNLIB io   error: ${HOME}.local/share/ov/pkg/isaac_sim-2022.1.0/kit/python/bin/libs/iray/libnvindex.so: cannot open shared object file: No such file or directory
2022-06-17 21:21:44 [3,669ms] [Error] [rtx.neuraylib.plugin] [INDEX:MAIN]   0.1   INDEX  main error: Failed to load ${HOME}.local/share/ov/pkg/isaac_sim-2022.1.0/kit/python/bin/libs/iray/libnvindex.so
[4.766s] [ext: omni.kit.window.preferences-1.2.1] startup
[4.834s] [ext: omni.kit.ui_test-1.2.0] startup
[4.837s] [ext: omni.graph.ui-1.6.1] startup
[4.858s] [ext: omni.kvdb-0.0.0] startup
[4.861s] [ext: omni.kit.widget.searchfield-1.0.6] startup
[4.863s] [ext: omni.convexdecomposition-1.4.12] startup
[4.866s] [ext: omni.graph.action-1.17.0] startup
[4.877s] [ext: omni.localcache-0.0.0] startup
[4.880s] [ext: omni.usdphysics-1.4.12] startup
[4.882s] [ext: omni.graph.scriptnode-0.5.0] startup
[4.884s] [ext: omni.physx-1.4.12-5.1] startup
[4.904s] [ext: omni.kit.usd_undo-0.1.0] startup
[4.907s] [ext: omni.graph.nodes-1.25.0] startup
[4.924s] [ext: omni.physx.commands-1.4.12-5.1] startup
[4.928s] [ext: omni.syntheticdata-0.2.1] startup
[4.948s] [ext: omni.physx.ui-1.4.12-5.1] startup
[5.007s] [ext: omni.warp-0.2.1] startup
Warp initialized:
   Version: 0.2.1
   Using CUDA device: NVIDIA RTX A6000
   Using CPU compiler: /usr/bin/g++
[6.672s] [ext: omni.kit.renderer.capture-0.0.0] startup
[6.676s] [ext: omni.kit.property.material-1.8.5] startup
[6.679s] [ext: omni.physx.demos-1.4.12-5.1] startup
[6.682s] [ext: omni.physics.tensors-0.1.0] startup
[6.688s] [ext: omni.kit.property.physx-0.1.0] startup
2022-06-17 21:21:47 [6,744ms] [Warning] [omni.physx.plugin] Deprecated: getSimulationEventStream is deprecated, please use getSimulationEventStreamV2
[6.759s] [ext: omni.kit.window.toolbar-1.2.4] startup
[6.765s] [ext: omni.physx.tensors-0.1.0] startup
[6.785s] [ext: omni.physx.vehicle-1.4.12-5.1] startup
[6.795s] [ext: omni.physx.tests-1.4.12-5.1] startup
[6.834s] [ext: omni.kit.numpy.common-0.1.0] startup
[6.837s] [ext: omni.physx.camera-1.4.12-5.1] startup
[6.844s] [ext: omni.physx.cct-1.4.12-5.1] startup
[6.888s] [ext: omni.isaac.version-1.0.0] startup
[6.889s] [ext: omni.isaac.dynamic_control-1.0.0] startup
[6.898s] [ext: omni.physx.bundle-1.4.12-5.1] startup
[6.898s] [ext: omni.kit.primitive.mesh-1.0.0] startup
[6.902s] [ext: omni.command.usd-1.0.1] startup
[6.905s] [ext: omni.isaac.core-1.15.1] startup
[6.990s] [ext: omni.replicator.core-1.2.0] startup
[7.172s] [ext: omni.kit.window.extensions-1.1.0] startup
[7.177s] [ext: omni.isaac.core_nodes-0.9.0] startup
[7.187s] [ext: omni.isaac.ui-0.2.1] startup
[7.189s] [ext: omni.kit.window.script_editor-1.6.2] startup
[7.194s] [ext: omni.isaac.wheeled_robots-0.5.4] startup
[7.205s] [ext: omni.kit.menu.common-1.0.0] startup
[7.206s] [ext: omni.kit.graph.delegate.default-1.0.15] startup
[7.208s] [ext: omni.kit.graph.delegate.modern-1.6.0] startup
[7.210s] [ext: omni.kit.widget.zoombar-1.0.3] startup
[7.211s] [ext: omni.kit.graph.editor.core-1.3.3] startup
[7.213s] [ext: omni.kit.widget.stage_icons-1.0.2] startup
[7.215s] [ext: omni.kit.browser.core-2.0.12] startup
[7.219s] [ext: omni.kit.graph.widget.variables-2.0.2] startup
[7.221s] [ext: omni.kit.window.stage-2.3.7] startup
[7.224s] [ext: omni.kit.browser.folder.core-1.1.13] startup
[7.226s] [ext: omni.graph.window.core-1.22.1] startup
[7.232s] [ext: omni.isaac.lula-1.1.0] startup
[7.245s] [ext: omni.graph.instancing-1.1.4] startup
[7.250s] [ext: omni.graph.window.action-1.3.8] startup
[7.251s] [ext: omni.graph.tutorials-1.1.2] startup
[7.264s] [ext: omni.rtx.window.settings-0.6.1] startup
[7.270s] [ext: omni.isaac.motion_planning-0.2.0] startup
[7.277s] [ext: omni.graph.bundle.action-1.0.0] startup
[7.277s] [ext: omni.rtx.settings.core-0.5.5] startup
[7.282s] [ext: omni.isaac.motion_generation-3.1.2] startup
[7.286s] [ext: omni.isaac.kit-0.1.9] startup
[7.286s] [ext: omni.isaac.debug_draw-0.1.2] startup
[7.291s] [ext: omni.kit.selection-0.1.0] startup
[7.292s] [ext: omni.isaac.franka-0.0.0] startup
[7.293s] [ext: omni.kit.widget.layers-1.5.17] startup
[7.305s] [ext: omni.kit.menu.edit-1.0.6] startup
[7.307s] [ext: omni.isaac.isaac_sensor-0.3.4] startup
2022-06-17 21:21:47 [7,298ms] [Warning] [omni.physx.plugin] Deprecated: getSimulationEventStream is deprecated, please use getSimulationEventStreamV2
[7.313s] [ext: omni.kit.widget.live-0.1.0] startup
[7.316s] [ext: omni.isaac.surface_gripper-0.1.2] startup
[7.320s] [ext: omni.kit.property.layer-1.1.2] startup
[7.322s] [ext: omni.isaac.range_sensor-0.4.2] startup
[7.346s] [ext: omni.graph.window.generic-1.3.8] startup
[7.347s] [ext: omni.isaac.utils-0.1.11] startup
[7.350s] [ext: omni.isaac.universal_robots-0.2.1] startup
[7.351s] [ext: omni.kit.property.audio-1.0.5] startup
[7.353s] [ext: omni.kit.property.skel-1.0.1] startup
[7.354s] [ext: omni.kit.property.render-1.1.0] startup
[7.355s] [ext: omni.kit.property.camera-1.0.3] startup
[7.356s] [ext: omni.kit.property.geometry-1.2.0] startup
[7.358s] [ext: omni.kit.property.light-1.0.5] startup
[7.359s] [ext: omni.kit.property.transform-1.0.2] startup
[7.362s] [ext: omni.isaac.occupancy_map-0.2.4] startup
[7.378s] [ext: omni.ui.scene-1.4.6] startup
[7.385s] [ext: omni.kit.window.console-0.2.0] startup
[7.389s] [ext: omni.kit.window.status_bar-0.1.1] startup
[7.393s] [ext: omni.kit.property.bundle-1.2.4] startup
[7.394s] [ext: omni.kit.menu.file-1.0.8] startup
[7.396s] [ext: omni.kit.manipulator.viewport-1.0.6] startup
[7.398s] [ext: omni.isaac.urdf-0.2.2] startup
[7.424s] [ext: omni.isaac.dofbot-0.2.0] startup
[7.424s] [ext: omni.kit.window.title-1.1.1] startup
[7.425s] [ext: omni.kit.profiler.window-1.4.4] startup
[7.431s] [ext: omni.graph.visualization.nodes-1.1.1] startup
[7.435s] [ext: omni.isaac.synthetic_utils-0.2.1] startup
[7.442s] [ext: semantics.schema.editor-0.2.2] startup
[7.445s] [ext: omni.isaac.sim.python-2022.1.0] startup
[7.446s] Simulation App Starting
[8.401s] app ready
[11.068s] Simulation App Startup Complete
2022-06-17 21:21:51 [11,168ms] [Warning] [carb.flatcache.plugin] Type tag does not have a corresponding USD type

Camera intrinsics
- width, height: 1920, 1080
- focal_length: 40.0
- horiz_aperture, vert_aperture: 20.95, 15.29
- horiz_fov, vert_fov: 29.36, 21.64
- focal_x, focal_y: 3665.0, 2825.23
- proj_mat: 
 [[-3.82  0.    0.    0.  ]
 [ 0.    6.79  0.    0.  ]
 [ 0.    0.    1.    1.  ]
 [ 0.    0.    1.    0.  ]]


Camera intrinsics
- width, height: 1920, 1080
- focal_length: 40.0
- horiz_aperture, vert_aperture: 20.95, 15.29
- horiz_fov, vert_fov: 29.36, 21.64
- focal_x, focal_y: 3665.0, 2825.23
- proj_mat: 
 [[-3.82  0.    0.    0.  ]
 [ 0.    6.79  0.    0.  ]
 [ 0.    0.    1.    1.  ]
 [ 0.    0.    1.    0.  ]]

Segmentation fault (core dumped)

Nvidia SMI drivers:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06   Driver Version: 470.129.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+

I’ve tried turning off multi-GPU, different renderers, headless / non-headless, nothing seems to work. Any advice on how I can solve this or debug further?

I was able to run in a debugger as well, however it’s only of limited use without debug symbols:

* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0x68)
  * frame #0: 0x00007ffd701d526a libcuda.so.1`___lldb_unnamed_symbol4891 + 250
    frame #1: 0x00007ffd700b3e9d libcuda.so.1`___lldb_unnamed_symbol2172 + 1837
    frame #2: 0x00007ffd7013e61e libcuda.so.1`___lldb_unnamed_symbol3246 + 94
    frame #3: 0x00007fea2051397c libomni.syntheticdata.plugin.so`___lldb_unnamed_symbol1903 + 220
    frame #4: 0x00007fea205526d1 libomni.syntheticdata.plugin.so`___lldb_unnamed_symbol2765 + 433
    frame #5: 0x00007fea204cb47a libomni.syntheticdata.plugin.so`___lldb_unnamed_symbol1278 + 1002
    frame #6: 0x00007fea204cd6b0 libomni.syntheticdata.plugin.so`___lldb_unnamed_symbol1280 + 3536
    frame #7: 0x00007fea204cd895 libomni.syntheticdata.plugin.so`___lldb_unnamed_symbol1281 + 101
    frame #8: 0x00007ff659784586 libomni.graph.core.plugin.so`___lldb_unnamed_symbol2203 + 614
    frame #9: 0x00007ff65979ff20 libomni.graph.core.plugin.so`___lldb_unnamed_symbol2437 + 144
    frame #10: 0x00007ff6598559f4 libomni.graph.core.plugin.so`___lldb_unnamed_symbol3930 + 932
    frame #11: 0x00007ff6597a016f libomni.graph.core.plugin.so`___lldb_unnamed_symbol2437 + 735
    frame #12: 0x00007ff659841f89 libomni.graph.core.plugin.so`___lldb_unnamed_symbol3667 + 441
    frame #13: 0x00007ff6b9fc3cab libomni.stageupdate.plugin.so`___lldb_unnamed_symbol206 + 235
    frame #14: 0x00007ff6ba72ace1 libomni.usd.so`___lldb_unnamed_symbol4448 + 3169
    frame #15: 0x00007ff6ba72bbad libomni.usd.so`___lldb_unnamed_symbol4449 + 157
    frame #16: 0x00007ff6ba6d641f libomni.usd.so`___lldb_unnamed_symbol3832 + 31
    frame #17: 0x00007ffff1bd7a63 libcarb.events.plugin.so`___lldb_unnamed_symbol631 + 499
    frame #18: 0x00007ff88ebd659c libomni.kit.loop-isaac.plugin.so`___lldb_unnamed_symbol184 + 892
    frame #19: 0x00007ffff37673e3 libomni.kit.app.plugin.so`___lldb_unnamed_symbol926 + 115
    frame #20: 0x00007ffff37b3817 libomni.kit.app.plugin.so`___lldb_unnamed_symbol1389 + 103
    frame #21: 0x00007ffff3a16580 _app.cpython-37m-x86_64-linux-gnu.so`___lldb_unnamed_symbol971 + 96
    frame #22: 0x00007ffff3a17257 _app.cpython-37m-x86_64-linux-gnu.so`___lldb_unnamed_symbol973 + 2679
    frame #23: 0x00007ffff77c0578 libpython3.7m.so.1.0`_PyMethodDef_RawFastCallKeywords(method=0x0000000000bb6f60, self=<unavailable>, args=<unavailable>, nargs=1, kwnames=0x0000000000000000) at call.c:693
    frame #24: 0x00007ffff77c0645 libpython3.7m.so.1.0`_PyCFunction_FastCallKeywords(func=0x00007ffff40d2870, args=<unavailable>, nargs=<unavailable>, kwnames=<unavailable>) at call.c:732
    frame #25: 0x00007ffff7797b70 libpython3.7m.so.1.0`_PyEval_EvalFrameDefault at ceval.c:4619
    frame #26: 0x00007ffff7797b03 libpython3.7m.so.1.0`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3093
    frame #27: 0x00007ffff778ef4f libpython3.7m.so.1.0`function_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=1, globals=<unavailable>) at call.c:283
    frame #28: 0x00007ffff7797e96 libpython3.7m.so.1.0`_PyEval_EvalFrameDefault at ceval.c:4616
    frame #29: 0x00007ffff7797e91 libpython3.7m.so.1.0`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3110
    frame #30: 0x00007ffff778ef4f libpython3.7m.so.1.0`function_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=0, globals=<unavailable>) at call.c:283
    frame #31: 0x00007ffff7796907 libpython3.7m.so.1.0`_PyEval_EvalFrameDefault at ceval.c:4616
    frame #32: 0x00007ffff7796902 libpython3.7m.so.1.0`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3124
    frame #33: 0x00007ffff78abd4e libpython3.7m.so.1.0`_PyEval_EvalCodeWithName(_co=<unavailable>, globals=<unavailable>, locals=<unavailable>, args=0x0000000000000000, argcount=0, kwnames=0x0000000000000000, kwargs=0x0000000000000000, kwcount=0, kwstep=2, defs=0x0000000000000000, defcount=0, kwdefs=0x0000000000000000, closure=0x0000000000000000, name=0x0000000000000000, qualname=0x0000000000000000) at ceval.c:3930
    frame #34: 0x00007ffff78abe2e libpython3.7m.so.1.0`PyEval_EvalCodeEx(_co=<unavailable>, globals=<unavailable>, locals=<unavailable>, args=0x0000000000000000, argcount=0, kws=0x0000000000000000, kwcount=0, defs=0x0000000000000000, defcount=0, kwdefs=0x0000000000000000, closure=0x0000000000000000) at ceval.c:3959
    frame #35: 0x00007ffff78abe5b libpython3.7m.so.1.0`PyEval_EvalCode(co=<unavailable>, globals=<unavailable>, locals=<unavailable>) at ceval.c:524
    frame #36: 0x00007ffff78e015a libpython3.7m.so.1.0`PyRun_FileExFlags at pythonrun.c:1037
    frame #37: 0x00007ffff78e012a libpython3.7m.so.1.0`PyRun_FileExFlags(fp=<unavailable>, filename_str=<unavailable>, start=<unavailable>, globals=0x00007ffff6c95f50, locals=0x00007ffff6c95f50, closeit=1, flags=0x00007fffffff237c) at pythonrun.c:990
    frame #38: 0x00007ffff78e02af libpython3.7m.so.1.0`PyRun_SimpleFileExFlags(fp=0x0000000000949150, filename=<unavailable>, closeit=1, flags=0x00007fffffff237c) at pythonrun.c:429
    frame #39: 0x00007ffff79049b6 libpython3.7m.so.1.0`pymain_main at main.c:456
    frame #40: 0x00007ffff7904938 libpython3.7m.so.1.0`pymain_main at main.c:1646
    frame #41: 0x00007ffff7904853 libpython3.7m.so.1.0`pymain_main at main.c:2907
    frame #42: 0x00007ffff7904588 libpython3.7m.so.1.0`pymain_main(pymain=0x00007fffffff2450) at main.c:3068
    frame #43: 0x00007ffff7904be9 libpython3.7m.so.1.0`_Py_UnixMain(argc=<unavailable>, argv=<unavailable>) at main.c:3103
    frame #44: 0x00007ffff732b083 libc.so.6`__libc_start_main(main=(python3`main), argc=2, argv=0x00007fffffff25a8, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffff2598) at libc-start.c:308:16
    frame #45: 0x000000000040418e python3`_start + 41

Thanks

Update, still doesn’t work.

I’ve updated the drivers to:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+

but it still crashes. Upon investigating this issue further I found a simple way to reproduce it (reproduce for me anyway):

from omni.isaac.kit import SimulationApp
import os
import carb

kit = SimulationApp(launch_config= {"renderer": "PathTracing", "headless": False,
             "width": 1028, "height": 1028, "num_frames": 1})

import omni.replicator.core as rep

camera = rep.create.camera()
render_product = rep.create.render_product(camera, (1024, 1024))
writer = rep.WriterRegistry.get("BasicWriter")

writer.initialize(output_dir="/tmp",
        rgb=True,
        bounding_box_2d_tight=False,
        bounding_box_2d_loose=False,
        semantic_segmentation=False,
        instance_segmentation=False,
        distance_to_camera=False,
        distance_to_image_plane=False,
        bounding_box_3d=False,
        occlusion=False,
        normals=False,
        motion_vectors=False,
        # Only camera params as it doesn't require GPU interopt.
        camera_params=True
    )

writer.attach([render_product])
rep.orchestrator.run()
while True:
    kit.update()

kit.close()

The above is enough to cause this crash. Note that all synthetic data that requires the GPU to write back to host is set to false other than RGB. However if I then set

rgb=False,

It runs fine and I can see the scene correctly rendered in the attached window.

Additionally, loading any model, navigating to Synthetic Data -> Synthetic Data Recorder and clicking any of the buttons in that produces the crash.

I’m able to prototype synthetic data setups and configs using just the GUI but unfortunately this is blocking our whole evaluation of Issac Sim as synthetic data is our only use case currently and these issues are preventing us from actually testing the thing end to end so any help on this would be appreciated.

I also cannot run the Replicator successfully. You may check my environment from the below topic

[Error] [omni.syntheticdata.plugin] CUDA error 801: cudaErrorNotSupported - operation not supported)

The problem may be related to functions supported in different versions of CUDA. You have tried CUDA 11.4 and 11.7, while I tried CUDA 11.6. I will try CUDA 11.5 later and see how it goes.

One more thing, you may need to run the nvcc -V to check your actual CUDA version. The CUDA version from the command nvidia-smi only shows the latest version of CUDA that the driver can support.

@zhengzj I was able to work around this issue, sort of. I downloaded the docker image and launched a container locally except using --gpus 1 instead of --gpus all (that still crashes) and I’m able to produce data ok using only one GPU

Isaac Sim | NVIDIA NGC

Investigating further, docker image isn’t needed just

"multi_gpu": False, "active_gpu": 1,

Is enough, unfortunately this leaves GPUs 0, 2-8 running idle so isn’t a great solution. Currently only 1 works oddly (!?). If this is a gpu configuration issue then some guidance from the Nvidia team would be appreciated on how multi-gpu should be configured for Isaac Sim

1 Like

Ok figured out the other GPUs as well

gpu_to_use = 6

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_to_use)

....

kit = SimulationApp({...., "multi_gpu": False, "active_gpu": gpu_to_use})

Seems to work for any GPU other than gpu 0. Looks like the root issue is that the synthetic data code post processing code (I believe SdRenderVarToRawArray) tries to access GPU memory in CUDA from a different GPU. Not sure how to workaround it more generally without that code however this unblocks me for now I think

1 Like

Hi smcgro,

Do you know how to set this from the Launcher? At the moment I am setting --/renderer/activeGpu=1 but that doesn’t solve the problem.

I don’t sorry, probably multi_gpu needs to be set somehow as well. I didn’t get it to work on the X display so my current workflow is that I run with synthetic data disabled on the GUI and then enable it for command line runs.

1 Like

Thanks for providing details @smcgro

Followed up with the Replicator team:

  • The bug related to generated data being copied from non-GPU 0 to GPU 0 is being worked upon and will be fixed in the next release.
  • In general multi-gpu for data generation will only benefit if you have multiple viewports, each viewport renders on its own GPU. For single viewport (which is what most of our samples currently use) a single GPU would have the same performance.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.