Maximize the GPU resources when using repo OmniIsaacGymEnvs

hitzcy2016 · September 18, 2023, 7:20pm

Hi,

I’m playing with the repo OmniIsaacGymEnvs now and trying to increase the number of instances in a single environment. My GPU is RTX A5000, 24GB memory.

For the built-in tasks like FrankaCabinet, I set the minibatch_size as twice as the number of instances, and the largest number of instances I can go is 8096. The GPU performance is like this :

When I increased the number to 1.5*8196=12294, I got an error like this:

2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] PhysX error: GPU integrateCoreParallel fail to launch kernel!!
, FILE /buildAgent/work/16dcef52b68a730f/source/gpusolver/src/PxgTGSCudaSolverCore.cpp, LINE 2393
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuEventRecord failed with error 700
, FILE /buildAgent/work/16dcef52b68a730f/source/gpucommon/include/PxgCudaUtils.h, LINE 75
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuStreamWaitEvent failed with error 700
, FILE /buildAgent/work/16dcef52b68a730f/source/gpucommon/include/PxgCudaUtils.h, LINE 81
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] PhysX error: GPU kernel 'markAggregateBoundsUpdated' failed to launch!!
, FILE /buildAgent/work/16dcef52b68a730f/source/gpubroadphase/src/PxgAABBManager.cpp, LINE 1206
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuEventRecord failed with error 700
, FILE /buildAgent/work/16dcef52b68a730f/source/gpucommon/include/PxgCudaUtils.h, LINE 75
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuStreamWaitEvent failed with error 700
, FILE /buildAgent/work/16dcef52b68a730f/source/gpucommon/include/PxgCudaUtils.h, LINE 81
2023-09-18 19:04:35 [109,109ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:04:35 [109,310ms] [Error] [omni.physx.plugin] PhysX error: PhysX Internal CUDA error. Simulation can not continue!, FILE /buildAgent/work/16dcef52b68a730f/source/physx/src/NpSceneFetchResults.cpp, LINE 216
2023-09-18 19:04:35 [109,310ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:04:36 [109,611ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../source/extensions/omni.physx.tensors/plugins/gpu/GpuArticulationView.cpp: 71
Error executing job with overrides: ['task=FrankaCabinet']
Traceback (most recent call last):
  File "scripts/rlgames_train.py", line 114, in parse_hydra_configs
    task = initialize_task(cfg_dict, env)
  File "/workspace/omniisaacgymenvs/omniisaacgymenvs/utils/task_util.py", line 77, in initialize_task
    env.set_task(task=task, sim_params=sim_config.get_physics_params(), backend="torch", init_sim=init_sim)
  File "/workspace/omniisaacgymenvs/omniisaacgymenvs/envs/vec_env_rlgames.py", line 51, in set_task
    super().set_task(task, backend, sim_params, init_sim)
  File "/isaac-sim/exts/omni.isaac.gym/omni/isaac/gym/vec_env/vec_env_base.py", line 94, in set_task
    self._world.reset()
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/world/world.py", line 282, in reset
    self._scene._finalize(self.physics_sim_view)
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/scenes/scene.py", line 290, in _finalize
    articulated_view.initialize(physics_sim_view)
  File "/workspace/omniisaacgymenvs/omniisaacgymenvs/robots/articulations/views/franka_view.py", line 28, in initialize
    super().initialize(physics_sim_view)
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/articulations/articulation_view.py", line 218, in initialize
    self._default_kps, self._default_kds = self.get_gains(clone=True)
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/articulations/articulation_view.py", line 1673, in get_gains
    kds[self._backend_utils.expand_dims(indices, 1), joint_indices], device=self._device
  File "/isaac-sim/exts/omni.isaac.core/omni/isaac/core/utils/torch/tensor.py", line 58, in move_data
    return data.to(device=device)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

After I switch to the self-created robot arm model, the maximum number is 2048 with the following GPU consumption.

If I raise the number to 3072, this error appears and obviously, the performance can be improved. For the model side, I have improved it through ways like reducing the number of triangular and vertices in meshing with mere enhancement.

2023-09-18 19:12:59 [20,725ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuEventRecord failed with error 700
, FILE /buildAgent/work/16dcef52b68a730f/source/gpucommon/include/PxgCudaUtils.h, LINE 53
2023-09-18 19:12:59 [20,726ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:12:59 [20,726ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuStreamWaitEvent failed with error 700
, FILE /buildAgent/work/16dcef52b68a730f/source/gpucommon/include/PxgCudaUtils.h, LINE 59
2023-09-18 19:12:59 [20,726ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:12:59 [20,726ms] [Error] [omni.physx.plugin] PhysX error: memcpy failed fail!
  700, FILE /buildAgent/work/16dcef52b68a730f/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp, LINE 2077
2023-09-18 19:12:59 [20,726ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2023-09-18 19:12:59 [20,743ms] [Warning] [omni.physx.plugin] PhysX warning: Failed to allocate pinned memory., FILE /buildAgent/work/16dcef52b68a730f/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, LINE 58
/isaac-sim/python.sh: line 41:  1300 Segmentation fault      (core dumped) $python_exe "$@" $args
There was an error running python

My goal is to create my own robots as many as I can and now there seems a large gap between my model and task FrankaCabinet.

Is there any parameter I need to change to break the limitation? I really need some suggestions and appreciate your reply!

Best,
Chay

kellyg · September 21, 2023, 11:00pm

Hi there, please try increasing the GPU buffer dimensions in the task config file, which can be found here for the FrankaCabinet task. Generally, the found lost pairs and aggregate pairs buffers are the ones that would likely need to be increased.

Topic		Replies	Views
Error when initializing Franka robot on cuda device Isaac Sim	2	664	February 16, 2024
Error PxgCudaDeviceMemoryAllocator fail to allocate memory 2147483648 bytes! Result = 2 Isaac Sim cuda , isaacsim , gym	6	2097	April 25, 2023
Help Needed: PhysX GPU Kernel Launch Errors！！ Isaac Sim cuda , kernel , pytorch , physx , isaac-sim-v4-2-0	2	227	December 20, 2024
Simulation crush while training agent in Isaac Sim Isaac Gym	2	697	October 4, 2023
Multiple isaac-sim containers on one GPU fails with CUDA illegal memory access in [omni.physx.tensors.plugin] Isaac Sim	5	2056	November 17, 2023
Cuda code performance CUDA Programming and Performance	14	3163	December 16, 2014
IsaacGymEnvs: CUDA error: an illegal memory access was encountered --> FrankaCabinet task (modified) Isaac Gym	3	402	May 3, 2025
I can't realize the kernel concurrent with Hyper-Q CUDA Programming and Performance	7	888	July 27, 2017
cudaSynchronizeDevice() returns error code 6 CUDA Programming and Performance	7	8607	June 16, 2011
Unable to create context on nvidia A40: Accelerator Fatal Error: call to cuCtxCreate returned error 801: Other General Discussion	5	89	December 12, 2024

Maximize the GPU resources when using repo OmniIsaacGymEnvs

Related topics