Multiple isaac-sim containers on one GPU fails with CUDA illegal memory access in [omni.physx.tensors.plugin]

We are trying to run HPS for Sim2Real policy learning on our cluster consisting of Quadro RTX 8000 and A100 GPUs. When we start multiple training runs on the same GPU any run but the first one fails with:

[Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../source/extensions/omni.physx.tensors/plugins/gpu/GpuArticulationView.cpp: 936
[Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../source/extensions/omni.physx.tensors/plugins/gpu/GpuArticulationView.cpp: 942

In python this error happens here:

  ...
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/articulations/articulation_view.py", line 2084, in get_jacobians
    current_values = self._physics_view.get_jacobians()
  File "/isaac-sim/kit/extsPhysics/omni.physics.tensors-104.2.4-5.1/omni/physics/tensors/impl/api.py", line 534, in get_jacobians
    raise Exception("Failed to get Jacobians from backend")
Exception: Failed to get Jacobians from backend

We are using custom environments that are loosely based on FrankaCabinet from the Omniverse Isaac Gym RL Environments.

Is it possible that the get_jacobians function in ArticulationView access specific memory blocks of the GPU, which fails when running multiple containers on the same physical device?

Any help is appreciated.

@miles.h i am just another OV user with limited knowledge with Issac. however, i would recommend uploading the latest log (in its entirety) for the mods/devs to evaluate the problem with more context. it may facilitate the troubleshooting process in allowing them to dig deeper into the console log. here is where you can find it:

  • Windows - C:\Users\%username%\.nvidia-omniverse\logs\Kit\Isaac-Sim
  • Linux - ~/.nvidia-omniverse/logs/Kit/Isaac-Sim

Hi @miles.h - This issue could be due to multiple reasons like:

  1. CUDA operations not being synchronized properly between different runs. If you’re sharing the same GPU among multiple training runs, it’s possible that one run might try to access memory that is currently being used by another run, causing this “illegal memory access” error. You could use cudaDeviceSynchronize() or cudaStreamSynchronize() to ensure that all previous CUDA operations in a device or stream have completed before moving on.
  2. Overlapping CUDA streams. If multiple CUDA streams are trying to access and modify the same memory location at the same time, they could potentially interfere with each other, causing this error. You might want to use separate CUDA streams for each training run.
  3. GPU memory exhaustion. Multiple training runs on the same GPU could potentially lead to GPU memory being exhausted. You can monitor GPU memory usage using tools like nvidia-smi. If the GPU memory is getting exhausted, you might have to reduce the memory footprint of each training run, or distribute the runs across multiple GPUs.

Hi @Simplychenable and @rthaker. Thank you for the information. I will paste a full isaac log at the bottom of this reply.

@rthaker: I can rule out 3. In one run we are simulating roughly 4000 environments in parallel which takes ~5Gb of GPU memory, whereas the GPUs have at least 40Gb available. I verified this with nvidia-smi. Some more information on the issue:

  • We isolate training runs by having each in their own docker container. I am not sure how you would synchronize between different training runs when they are their own applications running in separate docker containers
  • I am suspecting that the issue is 2, there might be some code in the cuda physx articulation backend of isaac-sim which works with fixed memory locations?

Is there some way of checking what memory locations are accessed by the code in question, to verify that both instances try to access the same location?

Full isaac log:
stdout

[Warning] [omni.isaac.kit.simulation_app] Modules: ['omni.isaac.kit.app_framework'] were loaded before SimulationApp was started and might not be loaded correctly.
[Warning] [omni.isaac.kit.simulation_app] Please check to make sure no extra omniverse or pxr modules are imported before the call to SimulationApp(...)
[Warning] [omni.kit.app.plugin] No crash reporter present, dumps uploading isn't available.
[Info] [carb] Logging to file: /isaac-sim/kit/logs/Kit/Isaac-Sim/2022.2/kit_20231010_114008.log
2023-10-10 11:40:08 [31ms] [Warning] [omni.ext.plugin] [ext: omni.sensors.nv.lidar] Extensions config 'extension.toml' doesn't exist '/isaac-sim/exts/omni.sensors.nv.lidar' or '/isaac-sim/exts/omni.sensors.nv.lidar/config'
2023-10-10 11:40:08 [31ms] [Warning] [omni.ext.plugin] [ext: omni.sensors.nv.radar] Extensions config 'extension.toml' doesn't exist '/isaac-sim/exts/omni.sensors.nv.radar' or '/isaac-sim/exts/omni.sensors.nv.radar/config'
[0.249s] [ext: omni.stats-0.0.0] startup
[0.261s] [ext: omni.rtx.shadercache-1.0.0] startup
[0.266s] [ext: omni.assets.plugins-0.0.0] startup
[0.268s] [ext: omni.gpu_foundation-0.0.0] startup
[0.279s] [ext: carb.windowing.plugins-1.0.0] startup
2023-10-10 11:40:08 [268ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-10 11:40:08 [268ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[0.283s] [ext: omni.kit.renderer.init-0.0.0] startup
2023-10-10 11:40:08 [384ms] [Warning] [carb.graphics-vulkan.plugin] No command queue family supports flags: 0x100, queue type: 3. No queues of this type will be created

|---------------------------------------------------------------------------------------------|
| Driver Version: 515.86.01     | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name                             | Active | LDA | GPU Memory | Vendor-ID | LUID       |
|     |                                  |        |     |            | Device-ID | UUID       |
|---------------------------------------------------------------------------------------------|
| 0   | Quadro RTX 8000                  | Yes: 0 |     | 49398   MB | 10de      | 0          |
|     |                                  |        |     |            | 1e30      | 81ee6025.. |
|=============================================================================================|
| OS: Linux apollo1, Version: 5.4.0-135-generic
| Processor: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz | Cores: Unknown | Logical: 8
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 1547825 | Free Memory: 1437562
| Total Page/Swap (MB): 0 | Free Page/Swap: 0
|---------------------------------------------------------------------------------------------|
[1.546s] [ext: omni.kit.pipapi-0.0.0] startup
[1.551s] [ext: omni.kit.pip_archive-0.0.0] startup
[1.554s] [ext: omni.kit.loop-isaac-1.0.0] startup
[1.555s] [ext: omni.kit.async_engine-0.0.0] startup
[1.557s] [ext: omni.kit.test-0.0.0] startup
[1.584s] [ext: omni.usd.config-1.0.0] startup
[1.589s] [ext: omni.usd.libs-1.0.0] startup
[1.849s] [ext: omni.isaac.core_archive-2.0.1] startup
[1.860s] [ext: omni.pip.torch-1_13_1-0.1.4] startup
[1.862s] [ext: omni.isaac.ml_archive-1.1.0] startup
[1.862s] [ext: omni.client-0.1.1] startup
[1.876s] [ext: omni.appwindow-1.0.1] startup
2023-10-10 11:40:10 [1,865ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-10 11:40:10 [1,865ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[1.879s] [ext: omni.kit.renderer.core-0.0.0] startup
2023-10-10 11:40:10 [1,900ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-10 11:40:10 [1,900ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-10 11:40:10 [1,902ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-10 11:40:10 [1,902ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[1.919s] [ext: omni.kit.renderer.capture-0.0.0] startup
[1.921s] [ext: omni.kit.renderer.imgui-0.0.0] startup
2023-10-10 11:40:10 [1,912ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-10 11:40:10 [1,912ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-10 11:40:10 [1,913ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-10 11:40:10 [1,913ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[2.062s] [ext: carb.audio-0.1.0] startup
[2.071s] [ext: omni.ui-2.14.4] startup
[2.092s] [ext: omni.uiaudio-1.0.0] startup
[2.094s] [ext: omni.kit.mainwindow-1.0.0] startup
[2.096s] [ext: omni.kit.uiapp-0.0.0] startup
[2.096s] [ext: omni.usd.schema.physics-1.0.0] startup
[2.158s] [ext: omni.usd.schema.geospatial-0.0.0] startup
[2.169s] [ext: omni.usd.schema.audio-0.0.0] startup
[2.178s] [ext: omni.usd.schema.anim-0.0.0] startup
[2.376s] [ext: omni.usd.schema.semantics-0.0.0] startup
[2.382s] [ext: omni.usd.schema.physx-0.0.0] startup
[2.427s] [ext: omni.usd.schema.omnigraph-1.0.0] startup
[2.439s] [ext: omni.usd.schema.omniscripting-1.0.0] startup
[2.450s] [ext: omni.kit.window.popup_dialog-2.0.16] startup
[2.458s] [ext: omni.kit.actions.core-1.0.0] startup
[2.461s] [ext: omni.kit.widget.nucleus_connector-1.0.3] startup
[2.465s] [ext: omni.kit.commands-1.4.5] startup
[2.470s] [ext: omni.gpucompute.plugins-0.0.0] startup
[2.471s] [ext: omni.usd.core-1.0.4] startup
[2.476s] [ext: omni.timeline-1.0.5] startup
[2.479s] [ext: omni.hydra.scene_delegate-0.3.0] startup
[2.488s] [ext: omni.kit.audiodeviceenum-1.0.0] startup
[2.490s] [ext: omni.hydra.usdrt_delegate-4.3.2] startup
[2.518s] [ext: omni.graph.tools-1.17.2] startup
[2.538s] [ext: omni.usd-1.6.30] startup
[2.608s] [ext: omni.kit.collaboration.channel_manager-1.0.9] startup
[2.610s] [ext: omni.kvdb-0.0.0] startup
[2.612s] [ext: omni.kit.usd.layers-2.0.11] startup
[2.625s] [ext: omni.kit.menu.utils-1.4.7] startup
[2.638s] [ext: omni.localcache-0.0.0] startup
[2.640s] [ext: omni.kit.primitive.mesh-1.0.8] startup
[2.645s] [ext: omni.convexdecomposition-104.2.4-5.1] startup
[2.648s] [ext: omni.kit.stage_templates-1.1.13] startup
[2.651s] [ext: omni.kit.usd_undo-0.1.2] startup
[2.652s] [ext: omni.graph.core-2.65.4] startup
[2.657s] [ext: omni.usdphysics-104.2.4-5.1] startup
[2.661s] [ext: omni.graph-1.50.2] startup
[2.738s] [ext: omni.physx-104.2.4-5.1] startup
2023-10-10 11:40:11 [2,751ms] [Warning] [omni.kvdb.plugin] wasn't able to load the meta database, trying to repair it ...
2023-10-10 11:40:11 [2,753ms] [Warning] [omni.kvdb.plugin] repair failed
[2.766s] [ext: omni.kit.numpy.common-0.1.0] startup
[2.769s] [ext: omni.graph.nodes-1.48.3] startup
[2.791s] [ext: omni.isaac.dynamic_control-1.2.3] startup
[2.807s] [ext: omni.isaac.kit-1.4.1] startup
[2.808s] [ext: omni.kit.widget.path_field-2.0.4] startup
[2.809s] [ext: omni.kit.search_core-1.0.2] startup
[2.810s] [ext: omni.kit.widget.browser_bar-2.0.5] startup
[2.812s] [ext: omni.kit.widget.filebrowser-2.3.10] startup
[2.827s] [ext: omni.kit.widget.versioning-1.3.8] startup
[2.830s] [ext: omni.kit.notification_manager-1.0.5] startup
[2.832s] [ext: omni.iray.libs-0.0.0] startup
[2.838s] [ext: omni.kit.window.filepicker-2.7.15] startup
[2.899s] [ext: omni.ui_query-1.1.1] startup
[2.901s] [ext: omni.mdl.neuraylib-0.1.0] startup
[2.904s] [ext: omni.kit.wiStarting kit application with the following args:  ['/isaac-sim/exts/omni.isaac.kit/omni/isaac/kit/simulation_app.py', '/isaac-sim/apps/omni.isaac.sim.python.gym.headless.kit', '--/app/tokens/exe-path=/isaac-sim/kit', '--/persistent/app/viewport/displayOptions=3094', '--/rtx/materialDb/syncLoads=True', '--/rtx/hydra/materialSyncLoads=True--/omni.kit.plugin/syncUsdLoads=True', '--/app/renderer/resolution/width=1280', '--/app/renderer/resolution/height=720', '--/app/window/width=1440', '--/app/window/height=900', '--/renderer/multiGpu/enabled=True', '--/app/fastShutdown=True', '--ext-folder', '/isaac-sim/exts', '--ext-folder', '/isaac-sim/apps', '--/physics/cudaDevice=0', '--portable', '--no-window', '--allow-root']
Passing the following args to the base kit application:  ['task=FrankaGoto', 'headless=True', 'seed=-1']
Warp 0.6.3 initialized:
   CUDA Toolkit: 11.5, Driver: 11.7
   Devices:
     "cpu"    | x86_64
     "cuda:0" | Quadro RTX 8000 (sm_75)
   Kernel cache: /root/.cache/warp/0.6.3
task: 
    name: FrankaGoto
    physics_engine: physx
    debug: False
    env: 
        numEnvs: 4096
        envSpacing: 3.0
        episodeLength: 500
        enableDebugVis: False
        clipObservations: 5.0
        clipActions: 1.0
        controlFrequencyInv: 2
        startPositionNoise: 0.0
        startRotationNoise: 0.0
        numProps: 4
        aggregateMode: 3
        successThreshold: 0.1
        velocityRateThreshold: 1.0
        accelerationRateThreshold: 1.0
        jerkRateThreshold: 1.0
        dofVelocityScale: 0.1
        distRewardScale: 10.0
        zeroTargetVelocityRewardScale: 0.0
        zeroTargetAccelerationRewardScale: 0.0
        collisionPenaltyScale: 0.0
        actionPenaltyScale: 0.1
        velocityPenaltyScale: 0.0
        accelerationPenaltyScale: 0.0
        jerkPenaltyScale: 0.0
        actionSmoothnessPenaltyScale: 0.1
        actionThresholdPenaltyScale: 0.0
        frankaRandomizationScale: 0.25
        neutralJointsPenaltyScale: 0.001
        jointLimitRewardScale: 0.1
        disturbCube: False
        objectObservation: True
        randomizeObject: False
        eval: 
            goal_poses: [[-0.2, -0.45, 0.2], [-0.2, -0.45, 0.45], [-0.2, -0.45, 0.7], [-0.2, 0.45, 0.2], [-0.2, 0.45, 0.45], [-0.2, 0.45, 0.7], [0.2, -0.45, 0.2], [0.2, -0.45, 0.45], [0.2, -0.45, 0.7], [0.2, 0.0, 0.2], [0.2, 0.0, 0.45], [0.2, 0.0, 0.7], [0.2, 0.45, 0.2], [0.2, 0.45, 0.45], [0.2, 0.45, 0.7], [0.6, -0.45, 0.2], [0.6, -0.45, 0.45], [0.6, -0.45, 0.7], [0.6, 0.0, 0.2], [0.6, 0.0, 0.45], [0.6, 0.0, 0.7], [0.6, 0.45, 0.2], [0.6, 0.45, 0.45], [0.6, 0.45, 0.7]]
        controlSpace: 
            rateLimiting: False
            velocityRateLimit: [2.175, 2.175, 2.175, 2.175, 2.61, 2.61, 2.61]
            accelerationRateLimit: [15.0, 7.5, 10.0, 12.5, 15.0, 20.0, 20.0]
            jerkRateLimit: [7500.0, 3750.0, 5000.0, 6250.0, 7500.0, 10000.0, 10000.0]
            velocityRateLimitScale: 1.0
            accelerationRateLimitScale: 1.0
            jerkRateLimitScale: 1.0
            effectiveDofs: 7
            targetFeedback: True
            type: JointVel
            jointPosActionScale: 40.0
            jointVelActionScale: 60
            cartPosActionScale: 45.0
            cartVelActionScale: 150.0
            mirroring: True
            impedanceStiffness: [40.0, 40.0, 40.0, 40.0, 30.0, 20.0, 20.0]
            controlOrientation: True
            oneStepIntegration: False
            orientationRepresentation: 6D
            nullSpace: True
            Kp: [1.5, 2.0, 2.5, 0.4, 0.4, 0.4]
            J_reg: 0.001
            J_w: [50.0, 50.0, 50.0, 10.0, 1.0, 1.0, 1.0]
            restPose: [0, 0, 0, -1.55, 0, 1.9, 0]
            nullGain: [1.2, 1.2, 1.2, 1.2, 0.5, 0.3, 0.1]
    sim: 
        dt: 0.0083
        use_gpu_pipeline: True
        gravity: [0.0, 0.0, -9.81]
        add_ground_plane: True
        use_flatcache: True
        enable_scene_query_support: False
        enable_cameras: False
        default_physics_material: 
            static_friction: 0.01
            dynamic_friction: 0.01
            restitution: 0.0
        physx: 
            worker_thread_count: 4
            solver_type: 1
            use_gpu: True
            solver_position_iteration_count: 12
            solver_velocity_iteration_count: 1
            contact_offset: 0.005
            rest_offset: 0.0
            bounce_threshold_velocity: 0.2
            friction_offset_threshold: 0.04
            friction_correlation_distance: 0.025
            enable_sleeping: True
            enable_stabilization: True
            max_depenetration_velocity: 1000.0
            gpu_max_rigid_contact_count: 524288
            gpu_max_rigid_patch_count: 33554432
            gpu_found_lost_pairs_capacity: 524288
            gpu_found_lost_aggregate_pairs_capacity: 262144
            gpu_total_aggregate_pairs_capacity: 1048576
            gpu_max_soft_body_contacts: 1048576
            gpu_max_particle_contacts: 1048576
            gpu_heap_capacity: 33554432
            gpu_temp_buffer_capacity: 16777216
            gpu_max_num_partitions: 8
        franka: 
            override_usd_defaults: False
            fixed_base: False
            enable_self_collisions: False
            enable_gyroscopic_forces: True
            solver_position_iteration_count: 12
            solver_velocity_iteration_count: 1
            sleep_threshold: 0.005
            stabilization_threshold: 0.001
            density: -1
            max_depenetration_velocity: 1000.0
            contact_offset: 0.005
            rest_offset: 0.0
        woodbox: 
            override_usd_defaults: False
            fixed_base: False
            enable_self_collisions: False
            enable_gyroscopic_forces: True
            solver_position_iteration_count: 12
            solver_velocity_iteration_count: 1
            sleep_threshold: 0.0
            stabilization_threshold: 0.001
            density: -1
            max_depenetration_velocity: 1000.0
            contact_offset: 0.005
            rest_offset: 0.0
        prop: 
            override_usd_defaults: False
            fixed_base: False
            enable_self_collisions: False
            enable_gyroscopic_forces: True
            solver_position_iteration_count: 12
            solver_velocity_iteration_count: 1
            sleep_threshold: 0.005
            stabilization_threshold: 0.001
            density: 100
            max_depenetration_velocity: 1000.0
            contact_offset: 0.005
            rest_offset: 0.0
train: 
    params: 
        seed: 95
        algo: 
            name: a2c_continuous
        model: 
            name: continuous_a2c_logstd
        network: 
            name: actor_critic
            separate: False
            space: 
                continuous: 
                    mu_activation: None
                    sigma_activation: None
                    mu_init: 
                        name: default
                    sigma_init: 
                        name: const_initializer
                        val: -1
                    fixed_sigma: True
            mlp: 
                units: [512, 256, 128, 64]
                activation: elu
                d2rl: False
                initializer: 
                    name: default
                regularizer: 
                    name: None
        load_checkpoint: False
        load_path: 
        config: 
            name: FrankaGoto-JointVel-95
            full_experiment_name: FrankaGoto-JointVel-95
            env_name: rlgpu
            device: cuda
            device_name: cuda
            ppo: True
            mixed_precision: False
            normalize_input: True
            normalize_value: True
            num_actors: 4096
            reward_shaper: 
                scale_value: 1.0
            normalize_advantage: True
            gamma: 0.99
            tau: 0.95
            learning_rate: 0.0005
            lr_schedule: adaptive
            kl_threshold: 0.008
            score_to_win: 100000000
            max_epochs: 750
            save_best_after: 100
            save_frequency: 100
            print_stats: True
            grad_normndow.file_importer-1.0.10] startup
[2.906s] [ext: omni.kit.ui_test-1.2.9] startup
[2.911s] [ext: omni.mdl-0.1.0] startup
[2.949s] [ext: omni.kit.widget.zoombar-1.0.4] startup
[2.950s] [ext: omni.kit.widget.searchfield-1.0.10] startup
[2.952s] [ext: omni.kit.material.library-1.3.21] startup
[2.960s] [ext: omni.kit.browser.core-2.2.2] startup
[2.966s] [ext: omni.kit.window.file_exporter-1.0.10] startup
[2.968s] [ext: omni.physics.tensors-0.1.0] startup
[2.978s] [ext: omni.kit.browser.folder.core-1.7.3] startup
[2.984s] [ext: omni.kit.window.file-1.3.32] startup
[2.988s] [ext: omni.physx.tensors-0.1.0] startup
[3.003s] [ext: omni.isaac.version-1.0.0] startup
[3.004s] [ext: omni.kit.browser.sample-1.2.5] startup
[3.009s] [ext: omni.isaac.cloner-0.4.1] startup
[3.010s] [ext: omni.isaac.core-1.46.3] startup
[3.313s] [ext: omni.warp-0.6.3] startup
[3.670s] [ext: omni.kit.window.title-1.1.2] startup
2023-10-10 11:40:12 [3,660ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-10 11:40:12 [3,660ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
2023-10-10 11:40:12 [3,662ms] [Warning] [carb] [Plugin: libomni.structuredlog.plugin.so] Module /isaac-sim/kit/libomni.structuredlog.plugin.so remained loaded after unload request
2023-10-10 11:40:12 [3,663ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2023-10-10 11:40:12 [3,663ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.3]) (impl: carb.windowing-glfw.plugin)
[3.676s] [ext: omni.isaac.gym-0.3.3] startup
[3.677s] [ext: omni.isaac.sim.python.gym.headless-2022.2.1] startup
[3.678s] Simulation App Starting
2023-10-10 11:40:12 [4,238ms] [Warning] [carb.audio.device] audio device is misconfigured or broken {deviceIndex = 0, name = 'default'} (The device is likely misconfigured, check your $HOME/.asoundrc)
2023-10-10 11:40:12 [4,238ms] [Warning] [carb.audio.output] failed to retrieve the capabilities for device 0 {result = eDeviceLost (2)}
2023-10-10 11:40:12 [4,238ms] [Warning] [carb.audio.context] failed to set the requested output during context creation.  Using a null streamer instead {result = eDeviceLost (2)}
[4.286s] app ready
2023-10-10 11:40:12 [4,288ms] [Warning] [omni.kit.browser.folder.core.models.folder_browser_model] Do not load cache for Warp because url changed:
2023-10-10 11:40:12 [4,288ms] [Warning] [omni.kit.browser.folder.core.models.folder_browser_model]  - from /isaac-sim/extscache/omni.warp-0.6.1+cp37/data/scenes
2023-10-10 11:40:12 [4,289ms] [Warning] [omni.kit.browser.folder.core.models.folder_browser_model]  -   to /isaac-sim/extscache/omni.warp-0.6.3+cp37/data/scenes
[4.311s] Simulation App Startup Complete
[4.352s] [ext: omni.inspect-1.0.1] startup
[4.364s] [ext: omni.kit.clipboard-1.0.0] startup
[4.370s] [ext: omni.kit.menu.create-1.0.8] startup
[4.383s] [ext: omni.volume-0.1.0] startup
[4.387s] [ext: omni.kit.context_menu-1.5.12] startup
[4.393s] [ext: omni.activity.core-1.0.1] startup
[4.397s] [ext: omni.hydra.rtx-0.1.0] startup
[4.406s] [ext: omni.debugdraw-0.1.1] startup
[4.414s] [ext: omni.kit.widget.stage-2.7.24] startup
[4.421s] [ext: omni.kit.window.property-1.8.2] startup
[4.424s] [ext: omni.kit.viewport.utility-1.0.14] startup
[4.426s] [ext: omni.kit.property.usd-3.18.17] startup
[4.438s] [ext: omni.kit.widget.text_editor-1.0.2] startup
[4.441s] [ext: omni.kit.widget.settings-1.0.1] startup
[4.443s] [ext: omni.kit.widget.graph-1.5.6-104_2] startup
[4.599s] [ext: omni.ui.scene-1.5.18] startup
[4.612s] [ext: omni.kit.window.preferences-1.3.8] startup
[4.686s] [ext: omni.kit.window.extensions-1.1.1] startup
[4.694s] [ext: omni.kit.widget.prompt-1.0.5] startup
[4.695s] [ext: omni.graph.ui-1.24.2] startup
[4.749s] [ext: omni.graph.scriptnode-0.10.0] startup
[4.753s] [ext: omni.graph.action-1.31.1] startup
[4.764s] [ext: omni.graph.bundle.action-1.3.0] startup
[4.765s] [ext: omni.syntheticdata-0.2.4] startup
2023-10-10 11:40:13 [4,769ms] [Warning] [omni.syntheticdata.scripts.extension] SyntheticData extension needs at least a stageFrameHistoryCount of 3
[4.788s] [ext: omni.command.usd-1.0.2] startup
[4.790s] [ext: omni.replicator.core-1.7.8] startup
2023-10-10 11:40:13 [4,806ms] [Warning] [omni.replicator.core.scripts.annotators] Annotator PostProcessDispatch is already registered, overwriting annotator template
[4.897s] [ext: omni.replicator.isaac-1.7.4] startup
2023-10-10 11:40:15 [7,042ms] [Warning] [omni.isaac.core.utils.viewports] could not get active viewport, cannot set camera view
2023-10-10 11:40:17 [8,858ms] [Warning] [omni.client.plugin]  Tick: authentication: Discovery(ws://localhost:3333): Error creating Api/Connection search: Not connected
2023-10-10 11:40:17 [8,861ms] [Warning] [omni.isaac.core.utils.nucleus] /persistent/app/omniverse/mountedDrives setting not found
2023-10-10 11:40:17 [8,861ms] [Warning] [omni.client.plugin]  HTTP Client: provider_http: CC-493: Request through cache failed. Retrying without cache for http://omniverse-content-production.s3-us-west-2.amazonaws.com/.cloudfront.toml
2023-10-10 11:40:17 [8,861ms] [Warning] [omni.client.plugin]  HTTP Client: omniclient: CC-873: Bypassing cache until the application is restarted
2023-10-10 11:40:17 [9,271ms] [Warning] [omni.client.plugin]  HTTP Client: provider_http: CC-493: Request through cache failed. Retrying without cache for http://omniverse-content-production.s3-us-west-2.amazonaws.com/Assets/Isaac/2022.2.1/
: 1.0
            entropy_coef: 0.01
            truncate_grads: True
            e_clip: 0.2
            horizon_length: 64
            minibatch_size: 16384
            mini_epochs: 4
            critic_coef: 4
            clip_value: True
            seq_len: 4
            bounds_loss_coef: 0.0001
task_name: FrankaGoto
experiment: 
num_envs: 
eval_group: 
offline_data_file: 
eval_experiment: 
seed: 95
torch_deterministic: False
max_iterations: 
physics_engine: physx
pipeline: gpu
sim_device: gpu
device_id: 0
rl_device: cuda
debug: False
num_threads: 4
solver_type: 1
test: False
checkpoint: 
headless: True
enable_livestream: False
mt_timeout: 30
wandb_activate: False
wandb_group: 
wandb_name: FrankaGoto-JointVel-95
wandb_entity: 
wandb_project: asimov-gym
name: FrankaGoto-JointVel-95
Setting seed: 95
Sim params does not have attribute:  physx
Sim params does not have attribute:  franka
Sim params does not have attribute:  woodbox
Sim params does not have attribute:  prop
Pipeline:  GPU
Pipeline Device:  cuda:0
Sim Device:  GPU
Task Device: cuda:0
RL device:  cuda
Setting up scene
Actor params does not have attribute:  fixed_base
Scene set up
Setting up init_data
init_data set up
Resetting envs tensor([   0,    1,    2,  ..., 4093, 4094, 4095], device='cuda:0')
bangbang_rate nan
smoothness_index nan
Success rate 0.00 based on 4096 episodes
best success rate 0.00
self.seed = 95
Started to train
Exact experiment name requested from command line: FrankaGoto-JointVel-95
Box(-1.0, 1.0, (7,), float32) Box(-inf, inf, (20,), float32)
current training device: cuda
build mlp: 20
RunningMeanStd:  (1,)
RunningMeanStd:  (20,)
[2023-10-10 11:41:15] Running RL reset
Resetting envs tensor([   0,    1,    2,  ..., 4093, 4094, 4095], device='cuda:0')
bangbang_rate nan
smoothness_index nan
Success rate 0.00 based on 8192 episodes
best success rate 0.00
saving next success rate: 0.0
=> saving checkpoint 'runs/FrankaGoto-JointVel-95/nn/FrankaGoto-JointVel-95-success.pth'
There was an error running python

stderr

/isaac-sim/kit/python/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
/isaac-sim/kit/python/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:412: UserWarning: In config: Invalid overriding of hydra/job_logging:
Default list overrides requires 'override' keyword.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/defaults_list_override for more information.

  deprecation_warning(msg)
2023-10-10 11:41:15 [67,306ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../source/extensions/omni.physx.tensors/plugins/gpu/GpuArticulationView.cpp: 947
/isaac-sim/kit/exts/omni.graph/omni/graph/core/_impl/autonode/type_definitions.py:12: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  Float = NewType("Float", np.float)
/isaac-sim/kit/kernel/py/omni/kit/app/_impl/__init__.py:104: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
  if not isinstance(setting_py, str) and isinstance(setting_py, collections.Sequence):
/isaac-sim/kit/python/lib/python3.7/site-packages/gym/spaces/box.py:84: UserWarning: e[33mWARN: Box bound precision lowered by casting to float32e[0m
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/src/asimov_gym/tasks/asimov_task.py:152: RuntimeWarning: invalid value encountered in double_scalars
  self._num_actions * len(self.bangbang_history) * self._num_envs
/isaac-sim/kit/extscore/omni.kit.pip_archive/pip_prebundle/numpy/core/fromnumeric.py:3441: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/isaac-sim/kit/extscore/omni.kit.pip_archive/pip_prebundle/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Error executing job with overrides: ['task=FrankaGoto', 'headless=True', 'seed=-1']
Traceback (most recent call last):
  File "./scripts/rlgames_train.py", line 149, in parse_hydra_configs
    rlg_trainer.run()
  File "./scripts/rlgames_train.py", line 64, in run
    "sigma": None,
  File "/isaac-sim/kit/python/lib/python3.7/site-packages/rl_games/torch_runner.py", line 120, in run
    self.run_train(args)
  File "/isaac-sim/kit/python/lib/python3.7/site-packages/rl_games/torch_runner.py", line 101, in run_train
    agent.train()
  File "/isaac-sim/kit/python/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 1162, in train
    self.obs = self.env_reset()
  File "/isaac-sim/kit/python/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 470, in env_reset
    obs = self.vec_env.reset()
  File "/omni-isaac-gym-envs/omniisaacgymenvs/utils/rlgames/rlgames_utils.py", line 102, in reset
    return self.env.reset()
  File "/src/asimov_gym/envs/vec_env_rlgames.py", line 50, in reset
    obs_dict, _, _, _ = self._reset_step()
  File "/src/asimov_gym/envs/vec_env_rlgames.py", line 27, in _reset_step
    ) = self._task.post_physics_step()
  File "/omni-isaac-gym-envs/omniisaacgymenvs/tasks/base/rl_task.py", line 254, in post_physics_step
    self.get_observations()
  File "/src/asimov_gym/tasks/franka_tasks/franka_goto.py", line 134, in get_observations
    self.franka_jac = self._frankas.get_jacobians()
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/articulations/articulation_view.py", line 2088, in get_jacobians
    result = self._backend_utils.clone_tensor(result, device=self._device)
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/utils/torch/tensor.py", line 36, in clone_tensor
    return torch.clone(data)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
/isaac-sim/python.sh: line 41: 639124 Segmentation fault      (core dumped) $python_exe "$@" $args

When launched with CUDA_LAUNCH_BLOCKING=1 the stderr becomes:

/isaac-sim/kit/python/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
/isaac-sim/kit/python/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:412: UserWarning: In config: Invalid overriding of hydra/job_logging:
Default list overrides requires 'override' keyword.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/defaults_list_override for more information.

  deprecation_warning(msg)
2023-10-10 11:52:10 [71,928ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../source/extensions/omni.physx.tensors/plugins/gpu/GpuArticulationView.cpp: 936
2023-10-10 11:52:10 [71,928ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../source/extensions/omni.physx.tensors/plugins/gpu/GpuArticulationView.cpp: 942
/isaac-sim/kit/exts/omni.graph/omni/graph/core/_impl/autonode/type_definitions.py:12: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  Float = NewType("Float", np.float)
/isaac-sim/kit/kernel/py/omni/kit/app/_impl/__init__.py:104: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
  if not isinstance(setting_py, str) and isinstance(setting_py, collections.Sequence):
/isaac-sim/kit/python/lib/python3.7/site-packages/gym/spaces/box.py:84: UserWarning: e[33mWARN: Box bound precision lowered by casting to float32e[0m
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/src/asimov_gym/tasks/asimov_task.py:152: RuntimeWarning: invalid value encountered in double_scalars
  self._num_actions * len(self.bangbang_history) * self._num_envs
/isaac-sim/kit/extscore/omni.kit.pip_archive/pip_prebundle/numpy/core/fromnumeric.py:3441: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/isaac-sim/kit/extscore/omni.kit.pip_archive/pip_prebundle/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Error executing job with overrides: ['task=FrankaGoto', 'headless=True', 'seed=-1']
Traceback (most recent call last):
  File "./scripts/rlgames_train.py", line 149, in parse_hydra_configs
    rlg_trainer.run()
  File "./scripts/rlgames_train.py", line 64, in run
    "sigma": None,
  File "/isaac-sim/kit/python/lib/python3.7/site-packages/rl_games/torch_runner.py", line 120, in run
    self.run_train(args)
  File "/isaac-sim/kit/python/lib/python3.7/site-packages/rl_games/torch_runner.py", line 101, in run_train
    agent.train()
  File "/isaac-sim/kit/python/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 1162, in train
    self.obs = self.env_reset()
  File "/isaac-sim/kit/python/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 470, in env_reset
    obs = self.vec_env.reset()
  File "/omni-isaac-gym-envs/omniisaacgymenvs/utils/rlgames/rlgames_utils.py", line 102, in reset
    return self.env.reset()
  File "/src/asimov_gym/envs/vec_env_rlgames.py", line 50, in reset
    obs_dict, _, _, _ = self._reset_step()
  File "/src/asimov_gym/envs/vec_env_rlgames.py", line 27, in _reset_step
    ) = self._task.post_physics_step()
  File "/omni-isaac-gym-envs/omniisaacgymenvs/tasks/base/rl_task.py", line 254, in post_physics_step
    self.get_observations()
  File "/src/asimov_gym/tasks/franka_tasks/franka_goto.py", line 134, in get_observations
    self.franka_jac = self._frankas.get_jacobians()
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/articulations/articulation_view.py", line 2084, in get_jacobians
    current_values = self._physics_view.get_jacobians()
  File "/isaac-sim/kit/extsPhysics/omni.physics.tensors-104.2.4-5.1/omni/physics/tensors/impl/api.py", line 534, in get_jacobians
    raise Exception("Failed to get Jacobians from backend")
Exception: Failed to get Jacobians from backend

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
/isaac-sim/python.sh: line 41: 651441 Segmentation fault      (core dumped) $python_exe "$@" $args

There has been some fixes to this API with the latest isaac sim release which I think should fix this issue. I suggest you try your example with isaac sim 2023 and see if you still get the same error message.

I am running version 2023.1.0-hotfix.1and I get the same error. What could be the cause of this I am running it with a very small number of environments and I am using a very powerful pc. It has 2 graphics cards; a RTX 3090 TI and a RTX A2000.

I run:
PYTHON_PATH scripts/random_policy.py task=FrankaDeformable num_envs=8

and get this error.

I2023-11-17 18:21:26 [16,363ms] [Error] [omni.kit.app._impl] [py stderr]:
2023-11-17 18:21:26 [16,363ms] [Error] [omni.kit.app._impl] [py stderr]:
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2023-11-17 18:21:26 [16,368ms] [Warning] [carb] [Plugin: omni.spectree.delegate.plugin] Module /home/fvr510/.local/share/ov/pkg/isaac_sim-2023.1.0-hotfix.1/kit/exts/omni.usd_resolver/bin/libomni.spectree.delegate.plugin.so remained loaded after unload request
2023-11-17 18:21:26 [16,370ms] [Warning] [omni.stageupdate.plugin] Deprecated: direct use of IStageUpdate callbacks is deprecated. Use IStageUpdate::getStageUpdate instead.
2023-11-17 18:21:26 [16,370ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: …/…/…/extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 349
2023-11-17 18:21:26 [16,371ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: …/…/…/extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 352
2023-11-17 18:21:26 [16,371ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: …/…/…/extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 355
2023-11-17 18:21:26 [16,371ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: …/…/…/extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 358
2023-11-17 18:21:26 [16,371ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: …/…/…/extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 361
2023-11-17 18:21:26 [16,371ms] [Warning] [carb.audio.context] 1 contexts were leaked
2023-11-17 18:21:27 [16,429ms] [Warning] [carb] Recursive unloadAllPlugins() detected!
2023-11-17 18:21:27 [16,437ms] [Warning] [omni.core.ITypeFactory] Module /home/fvr510/.local/share/ov/pkg/isaac_sim-2023.1.0-hotfix.1/kit/exts/omni.activity.core/bin/libomni.activity.core.plugin.so remained loaded after unload request.
There was an error running python