Isaac Sim [omni.physx.tensors.plugin] CUDA error: illegal memory access

Hey,
I’m trying to train something using Isaac Sim with many instances of the same environment (roughly 2000). I’m facing an issue where the exact same code and script will sometimes error and sometimes execute without issues.

I’m using a NVIDIA GeForce RTX 4090
Driver Version: 550.40.07
CUDA Version: 12.4

Here’s my stack trace

2024-02-29 00:57:32 [19,545ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../source/extensions/omni.physx.tensors/plugins/gpu/ThrustUtils.h: 40
Traceback (most recent call last):
  File "launch_ppo.py", line 419, in <module>
    run(**cfg)
  File "launch_ppo.py", line 243, in run
    sub_env = envs.create_env(env_name, max_path_length=cfg["max_path_length"], randomize_action_mag=randomize_action_mag, randomize_object_name=randomize_object_name,num_envs=num_envs,continuous_action_space=continuous_action_space, display=display, render_images=False, img_shape=(img_width,img_height),usd_path=usd_path, usd_name=usd_name, num_cameras=num_cameras,randomize_pos=randomize_pos, randomize_rot=randomize_rot, euler_rot=euler_rot, cfg=cfg)
  File "/home/arhan/projects/PolicyLearning/huge/envs/__init__.py", line 26, in create_env
    return IsaacGoalEnv(max_path_length=max_path_length, display=display,randomize_action_mag=randomize_action_mag, randomize_object_name=randomize_object_name, render_images=render_images,img_shape=img_shape, usd_name=usd_name, usd_path=usd_path, num_envs=num_envs, sensors=sensors,num_cameras=num_cameras, euler_rot=euler_rot, randomize_rot=randomize_rot, randomize_pos=randomize_pos, cfg=cfg)
  File "/home/arhan/projects/PolicyLearning/huge/envs/isaac_env.py", line 398, in __init__
    env = IsaacIntermediateEnv(continuous_action_space=continuous_action_space, randomize_object_name=randomize_object_name, display=display, render_images=render_images, max_path_length=max_path_length, img_shape=img_shape, usd_path=usd_path,usd_name=usd_name,num_envs=num_envs, randomize_action_mag=randomize_action_mag, sensors=sensors, num_cameras=num_cameras, euler_rot=euler_rot, randomize_pos=randomize_pos, randomize_rot=randomize_rot, cfg=cfg)
  File "/home/arhan/projects/PolicyLearning/huge/envs/isaac_env.py", line 65, in __init__
    self._env = gym.make(task, cfg=env_cfg, headless=not display)
  File "/home/arhan/miniconda3/envs/isaac-sim/lib/python3.7/site-packages/gym/envs/registration.py", line 640, in make
    env = env_creator(**_kwargs)
  File "/home/arhan/projects/PolicyLearning/huge/envs/general/general_env.py", line 113, in __init__
    self.sim.step()
  File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/exts/omni.isaac.core/omni/isaac/core/simulation_context/simulation_context.py", line 468, in step
    self._physics_sim_view.flush()
  File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extsPhysics/omni.physics.tensors-104.2.4-5.1/omni/physics/tensors/impl/api.py", line 99, in flush
    return self._backend.flush()
RuntimeError: copy_if failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: <function _make_registry.<locals>._Registry.__del__ at 0x7f9a0ee279e0>
Traceback (most recent call last):
  File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 103, in __del__
  File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 98, in destroy
TypeError: 'NoneType' object is not callable
Exception ignored in: <function _make_registry.<locals>._Registry.__del__ at 0x7f9a0ee279e0>
Traceback (most recent call last):
  File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 103, in __del__
  File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 98, in destroy
TypeError: 'NoneType' object is not callable
Exception ignored in: <function SettingChangeSubscription.__del__ at 0x7f9d6919c320>
Traceback (most recent call last):
  File "/home/arhan/projects/orbit/_isaac_sim/kit/kernel/py/omni/kit/app/_impl/__init__.py", line 114, in __del__
AttributeError: 'NoneType' object has no attribute 'get_settings'
Exception ignored in: <function RegisteredActions.__del__ at 0x7f99dcdeef80>
Traceback (most recent call last):
  File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/extscache/omni.kit.viewport.menubar.lighting-104.0.9/omni/kit/viewport/menubar/lighting/actions.py", line 347, in __del__
  File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/extscache/omni.kit.viewport.menubar.lighting-104.0.9/omni/kit/viewport/menubar/lighting/actions.py", line 352, in destroy
TypeError: 'NoneType' object is not callable
Segmentation fault (core dumped)

I’d appreciate any helps on what may cause this issue and tips on debugging.

Thanks,
Arhan

@arhanj i am just a passerby, but does this behavior persist in the latest version of Isaac Sim (2023.1.1), which is often recommended by the mods/devs as opposed to using an older versions? it could be a bug that’s been addressed in newer versions.

along those lines, reverting to the suggested driver listed in the doc may offer a more stable experience than newer versions given it’s gone through internal testing:

https://docs.omniverse.nvidia.com/isaacsim/latest/common/technical-requirements.html

I tried downgrading to the recommended drivers and still face this issue. The same code is able to work on Isaac Sim 2022.2.1 on other workstations/environments, so I want to single out what about my environment may be causing this issue :(