Explanation of contact parameters and relation to GPU memory usage

Hi,

I am using ASE/AMP for a grasping task. My problem is that the GPU (11GB) is running out of VRAM very quickly (increasing from 4GB initially to 11GB+).

  1. I was wondering why the memory increases over time. My guess would be that with longer training time the model learns to grasp, thus causing more contacts/collisions: which increases GPU memory ?
  2. In general how can I find why the GPU is running out of memory
    • For instance I have been logging the total gpu memory and torch.cuda.memory_allocated. I observed that “torch.cuda.memory_allocated” stayed constant so it must be isaacgym using more memory? Can I somehow check what causes this memory increase?

Additionally, I struggle to completely understand following contact parameters as the documentation is very short.

  1. contact_offset - shapes whose distance is less than the sum of their contactOffset values will generate contacts:
  2. max_gpu_contact_pairs - Maximum number of contact pairs
    • How much memory does one contact pair use? How can I compute the GPU memory used for contact pairs based on this parameter?
    • Is the entire memory for contact pairs preallocated? I experience that when I increase the value of this parameter and then start the simulation the GPU uses more memory from the get go; But memory still increases during training?
  3. default_buffer_size_multiplier - will scale additional buffers used by PhysX.
    • What kind of buffers?

I would appreciate any help.

Thanks!

Same problem, Weird memory allocation (possibly memory leak) during simulation - #2 by capoo4938 works for me.

It seems like you’re encountering GPU memory usage issues while using ASE/AMP for a grasping task. The increase in memory over time could be due to various factors, such as increased collisions or contacts as the model learns to grasp. To diagnose the issue, you’re tracking GPU memory usage and torch.cuda.memory_allocated, but you’ve noticed that the memory increase might be attributed to isaacgym.

For contact parameters:

contact_offset: This parameter determines the distance at which shapes generate contacts. Generating contacts means that the simulator detects interactions between shapes that are within this distance. Lowering this value could potentially lead to more frequent contacts, which might impact GPU memory usage due to increased collision computations.

max_gpu_contact_pairs: This parameter controls the maximum number of contact pairs that can be tracked. The memory usage per contact pair depends on the complexity of the shapes involved in the collision. The exact memory usage formula may not be straightforward and might depend on various factors.

default_buffer_size_multiplier: This parameter scales additional buffers used by PhysX. These buffers are used for various calculations and simulations within the physics engine. Increasing this parameter might lead to more preallocated memory for PhysX-related operations.

To address the GPU memory issue, you could consider:

Adjusting the contact parameters to find a balance between accuracy and memory usage.
Optimizing your grasping task model to potentially reduce the number of collisions or contacts.
Experimenting with different batch sizes or training strategies to see if they impact memory usage.
Keep in mind that GPU memory usage can be influenced by various factors, including the complexity of the simulation, the number of objects involved, and the specific interactions between them. It’s often a matter of fine-tuning parameters and optimizing your model to achieve the desired trade-off between accuracy and memory usage.

1 Like