Weird memory allocation (possibly memory leak) during simulation

I first appreciate all the efforts from the developers and useful discussions from the Isaac Gym users.

Recently, I detected weird memory allocation while training released github code.
The training task was HumanoidAMP, and I customized given humanoid asset little bit.
Also I slightly modified environment code accordingly, and the training seemed to start fairly good.
However while training there was an unknown increase in GPU memory, and finally the whole process stopped and the viewer remained unfinished for about 6 hours.
Here are the records on the GPU memory allocation and GPU utilization.

I changed customized asset to the released asset and tested the training procedure working well without any increase in the GPU memory allocation.
So I suspect that my customized asset caused problems related to this unwanted GPU memory in the simulation process since the allocated memory increased irregularly as shown above.
However still feel weird as it seems there is no problem while monitoring the training procedure.

What kind of problem in the asset can possibly trigger undesired waste of GPU memory allocation?

I solved (hopefully) this problem as there is no longer GPU memory explosion in the current experiments.

The primary reason for consuming further memories was the Collision.
With training proceeding, contacts between humanoid agent and a box type actor started to occur frequently.
However since I modified original humanoid asset to have much more complex structure, contacts calculation from the detailed mesh bodies seemed to burden GPU.

The solution was simply aggregating humanoid agent with an interacting box actor.
The related code can be found in the create_envs from the released shadow hand example.

I hope this simple solution can benefit someones who are suffering from the similar memory problem.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.