Hi,
I am using ASE/AMP for a grasping task. My problem is that the GPU (11GB) is running out of VRAM very quickly (increasing from 4GB initially to 11GB+).
- I was wondering why the memory increases over time. My guess would be that with longer training time the model learns to grasp, thus causing more contacts/collisions: which increases GPU memory ?
- In general how can I find why the GPU is running out of memory
- For instance I have been logging the total gpu memory and torch.cuda.memory_allocated. I observed that “torch.cuda.memory_allocated” stayed constant so it must be isaacgym using more memory? Can I somehow check what causes this memory increase?
Additionally, I struggle to completely understand following contact parameters as the documentation is very short.
- contact_offset - shapes whose distance is less than the sum of their contactOffset values will generate contacts:
- What does it mean “to generate contacts”?
- Does this also mean the shapes collide at this distance?
- Does lowering this value lead to less GPU memory usage? (although it is not advised to set it too low IsaacGymEnvs/factory.md at main · NVIDIA-Omniverse/IsaacGymEnvs · GitHub)
- max_gpu_contact_pairs - Maximum number of contact pairs
- How much memory does one contact pair use? How can I compute the GPU memory used for contact pairs based on this parameter?
- Is the entire memory for contact pairs preallocated? I experience that when I increase the value of this parameter and then start the simulation the GPU uses more memory from the get go; But memory still increases during training?
- default_buffer_size_multiplier - will scale additional buffers used by PhysX.
- What kind of buffers?
I would appreciate any help.
Thanks!