I am having some issues with running my model in ISAAC gym. But after a few simulation steps, gym.simulate() will freeze/hang. And the entire code will stop there, and it does not give any error. I wonder how to debug in this case? And what would people suggest doing in this case? Or are there some hypotheses about when this would happen? Thanks.
One thing you can do in this case is record a PhysX debug log, by setting the GYM_PVD_FILE environment variable:
export GYM_PVD_FILE=testpvd
Alternatively, you could point our team toward a reproducible test case.
You can also try using --device=CPU command line parameter to determine if the issue is related to either running in GPU PhysX, or using the tensor communications path. The --device=CPU parameter forces the tensor communications path off, which will also disable the GPU PhysX path by default.
You can also use --device=CPU along with --physx_gpu to force GPU PhysX simulation on while still disabling the tensor path - this mode copies GPU results back to the CPU and returns CPU tensors for any tensor APIs you use.
I was having a similar problem with gym.simulate(), for me it would always occur at the same time at end of an episode when the simulation would call reset(). I found that the issue related to gym.set_dof_state_tensor_indexed()which was called during the reset and applyed in the next simulate call. Commenting out this part of the code stopped the hanging, but I am not sure why it was not working as I have implemented the function in the same way as FrankaCabinet.py
I hope that can save others from experiencing similar headaches as I had for the past few days.