Hey,
I’m trying to train something using Isaac Sim with many instances of the same environment (roughly 2000). I’m facing an issue where the exact same code and script will sometimes error and sometimes execute without issues.
I’m using a NVIDIA GeForce RTX 4090
Driver Version: 550.40.07
CUDA Version: 12.4
Here’s my stack trace
2024-02-29 00:57:32 [19,545ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../source/extensions/omni.physx.tensors/plugins/gpu/ThrustUtils.h: 40
Traceback (most recent call last):
File "launch_ppo.py", line 419, in <module>
run(**cfg)
File "launch_ppo.py", line 243, in run
sub_env = envs.create_env(env_name, max_path_length=cfg["max_path_length"], randomize_action_mag=randomize_action_mag, randomize_object_name=randomize_object_name,num_envs=num_envs,continuous_action_space=continuous_action_space, display=display, render_images=False, img_shape=(img_width,img_height),usd_path=usd_path, usd_name=usd_name, num_cameras=num_cameras,randomize_pos=randomize_pos, randomize_rot=randomize_rot, euler_rot=euler_rot, cfg=cfg)
File "/home/arhan/projects/PolicyLearning/huge/envs/__init__.py", line 26, in create_env
return IsaacGoalEnv(max_path_length=max_path_length, display=display,randomize_action_mag=randomize_action_mag, randomize_object_name=randomize_object_name, render_images=render_images,img_shape=img_shape, usd_name=usd_name, usd_path=usd_path, num_envs=num_envs, sensors=sensors,num_cameras=num_cameras, euler_rot=euler_rot, randomize_rot=randomize_rot, randomize_pos=randomize_pos, cfg=cfg)
File "/home/arhan/projects/PolicyLearning/huge/envs/isaac_env.py", line 398, in __init__
env = IsaacIntermediateEnv(continuous_action_space=continuous_action_space, randomize_object_name=randomize_object_name, display=display, render_images=render_images, max_path_length=max_path_length, img_shape=img_shape, usd_path=usd_path,usd_name=usd_name,num_envs=num_envs, randomize_action_mag=randomize_action_mag, sensors=sensors, num_cameras=num_cameras, euler_rot=euler_rot, randomize_pos=randomize_pos, randomize_rot=randomize_rot, cfg=cfg)
File "/home/arhan/projects/PolicyLearning/huge/envs/isaac_env.py", line 65, in __init__
self._env = gym.make(task, cfg=env_cfg, headless=not display)
File "/home/arhan/miniconda3/envs/isaac-sim/lib/python3.7/site-packages/gym/envs/registration.py", line 640, in make
env = env_creator(**_kwargs)
File "/home/arhan/projects/PolicyLearning/huge/envs/general/general_env.py", line 113, in __init__
self.sim.step()
File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/exts/omni.isaac.core/omni/isaac/core/simulation_context/simulation_context.py", line 468, in step
self._physics_sim_view.flush()
File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extsPhysics/omni.physics.tensors-104.2.4-5.1/omni/physics/tensors/impl/api.py", line 99, in flush
return self._backend.flush()
RuntimeError: copy_if failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: <function _make_registry.<locals>._Registry.__del__ at 0x7f9a0ee279e0>
Traceback (most recent call last):
File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 103, in __del__
File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 98, in destroy
TypeError: 'NoneType' object is not callable
Exception ignored in: <function _make_registry.<locals>._Registry.__del__ at 0x7f9a0ee279e0>
Traceback (most recent call last):
File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 103, in __del__
File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/kit/extscore/omni.kit.viewport.registry/omni/kit/viewport/registry/registry.py", line 98, in destroy
TypeError: 'NoneType' object is not callable
Exception ignored in: <function SettingChangeSubscription.__del__ at 0x7f9d6919c320>
Traceback (most recent call last):
File "/home/arhan/projects/orbit/_isaac_sim/kit/kernel/py/omni/kit/app/_impl/__init__.py", line 114, in __del__
AttributeError: 'NoneType' object has no attribute 'get_settings'
Exception ignored in: <function RegisteredActions.__del__ at 0x7f99dcdeef80>
Traceback (most recent call last):
File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/extscache/omni.kit.viewport.menubar.lighting-104.0.9/omni/kit/viewport/menubar/lighting/actions.py", line 347, in __del__
File "/home/arhan/.local/share/ov/pkg/isaac_sim-2022.2.1/extscache/omni.kit.viewport.menubar.lighting-104.0.9/omni/kit/viewport/menubar/lighting/actions.py", line 352, in destroy
TypeError: 'NoneType' object is not callable
Segmentation fault (core dumped)
I’d appreciate any helps on what may cause this issue and tips on debugging.
Thanks,
Arhan