I am using RTX3060 with 6GB memory to run IsaacGymEnvs exemplary tasks like Ant or Anymal and come across Cuda run out of memory issue:
[Error] [carb.gym.plugin] Gym cuda error: out of memory: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 1718
[Error] [carb.gym.plugin] Gym cuda error: invalid resource handle: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 6003
[Error] [carb.gym.plugin] Gym cuda error: out of memory: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991
[Error] [carb.gym.plugin] Gym cuda error: invalid resource handle: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 5859
Error executing job with overrides: ['task=Ant']
Traceback (most recent call last):
File "train.py", line 112, in launch_rlg_hydra
'play': cfg.test,
File "/home/jrenaf/.local/lib/python3.6/site-packages/rl_games/torch_runner.py", line 139, in run
self.run_train()
File "/home/jrenaf/.local/lib/python3.6/site-packages/rl_games/torch_runner.py", line 122, in run_train
agent = self.algo_factory.create(self.algo_name, base_name='run', config=self.config)
File "/home/jrenaf/.local/lib/python3.6/site-packages/rl_games/common/object_factory.py", line 15, in create
return builder(**kwargs)
File "/home/jrenaf/.local/lib/python3.6/site-packages/rl_games/torch_runner.py", line 23, in <lambda>
self.algo_factory.register_builder('a2c_continuous', lambda **kwargs : a2c_continuous.A2CAgent(**kwargs))
File "/home/jrenaf/.local/lib/python3.6/site-packages/rl_games/algos_torch/a2c_continuous.py", line 18, in __init__
a2c_common.ContinuousA2CBase.__init__(self, base_name, config)
File "/home/jrenaf/.local/lib/python3.6/site-packages/rl_games/common/a2c_common.py", line 973, in __init__
A2CBase.__init__(self, base_name, config)
File "/home/jrenaf/.local/lib/python3.6/site-packages/rl_games/common/a2c_common.py", line 84, in __init__
self.vec_env = vecenv.create_vec_env(self.env_name, self.num_actors, **self.env_config)
File "/home/jrenaf/.local/lib/python3.6/site-packages/rl_games/common/vecenv.py", line 282, in create_vec_env
return vecenv_config[vec_env_name](config_name, num_actors, **kwargs)
File "train.py", line 90, in <lambda>
lambda config_name, num_actors, **kwargs: RLGPUEnv(config_name, num_actors, **kwargs))
File "/home/jrenaf/IsaacGymEnvs/isaacgymenvs/utils/rlgames_utils.py", line 159, in __init__
self.env = env_configurations.configurations[config_name]['env_creator'](**kwargs)
File "train.py", line 93, in <lambda>
'env_creator': lambda **kwargs: create_rlgpu_env(**kwargs),
File "/home/jrenaf/IsaacGymEnvs/isaacgymenvs/utils/rlgames_utils.py", line 91, in create_rlgpu_env
headless=headless
File "/home/jrenaf/IsaacGymEnvs/isaacgymenvs/tasks/ant.py", line 97, in __init__
zero_tensor = torch.tensor([0.0], device=self.device)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
It seems like GPU memory is not enough for the training? What parameters should I tune to solve this issue?