We use Gym to simulate many environments at the same time, multiple times a row, using the Python API. We see memory usage increase on the GPU and CPU.
Steps to reproduce:
1 Create a gym object
2 Create a sim
3 Create multiple environments with some actors (100x1 for us), loaded via URDF.
4 Create a viewer
5 Run the simulation (30s for us, dt 0.02, 2 steps)
6 Destroy the viewer
7 Destroy the sim
7.5 possibly destroy the environments, but this is not in the documentation tutorials(?)
8 Go back to step 2, reusing the Gym object
Are we missing things we need to manually destroy? The memory usage increase on the GPU is not minor. over 100Mb per loop in our situation.
Unfortunately Isaac Gym does not do a very good job of memory management when the viewer and sim are destroyed. It’s really designed for bringing up an environment with many agents and executing them all in parallel. This is not something that we will be putting more effort into, since we are mostly focused on bringing our gym capabilities into Omniverse and Isaac Sim.
If you want to do something like training on a large number of different objects, I would recommend creating a set of such objects at the beginning, placing them somewhere far away from the robot you are training, and transforming whatever one you want close to the training agent at reset time.
Thank you for your reply. That is unfortunate to hear. Your suggestion is not possible for my use case, as the physical shape of the objects changes significantly between my simulations. The search space for the shape is very large so I cannot load all possible shapes beforehand.
I don’t mind destroying the complete isaac gym set up and restarting, but currently I need to run isaac in a different process so I don’t have a memory leak in my python program. Is there a way to circumvent this?
Unfortunately we probably don’t have a solution for this right now. Memory-wise, the viewer is likely to take up the most space, so you might try just destroying and recreating the sim, but there are definitely no guarantees. Your other option is going to be running separate processes as it sounds like you’re already doing now.
Thank you. The different process solution is workable but not optimal. I hope that after isaac gym is integrated in omniverse there will be time to resolve this. For now I will use my workaround.
It seems that Isaac Sim has the same issue.
loop:
1. create a stage
2. load all objects and robots
3. setup multiple camera and synthetic data helper
4. record data
5. close stage
Is there any help I can get? The memory leak problem has been causing a lot of headaches for me. Or is there anything I should do to avoid memory leaks?
Hi, I am having the same memory leak issues. My goal is to test different terrain. I launch my environment with a specific terrain. Once I am done, I destroy the sim, the viewer, delete all the tensors stored on the GPU and set them to None, and I call torch.cuda.empty_cache(). However, after a few iterations of this I either get a segmentation fault or I get an error like this: [Error] [carb.gym.plugin] Gym cuda error: out of memory: …/…/…/source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 1721
@aart.stuurman@gstate Have you all found any solutions? Is there a way to update the terrain while the simulation is running so I don’t have to launch multiple instances? Has there been any updates on how to completely destroy Isaac Gym?