Using RL to set starting position of a body in the environment

Good afternoon,

I am starting to sink my teeth into using Isaac Gym for reinforcement learning. I have been looking at the cartpole as well as other examples, and I have a question on what we are able to get the program to manipulate in order to increase the reward. What I would like to do is have the RL figure out where to place the starting position of the pole in order to balance it, allowing it to select between 0 and pi in increments of 0.1, rather than adjust the forces applied to the cart to get it to balance.

It seems that there is a num_actions value of 1 within this simulation that seems to correspond to the action of applying the force to the cart, and RL seems to create/control the actions tensor associated with this force throughout the simulation. Is there a way to have the RL create/control a position at simulation reset rather than applying forces/setting position targets during the simulation?

Hi there,

Yes, this should be possible. In your pre_physics_step function, you can pass your actions into the reset function and set the position targets there instead of applying them as forces. You can also modify the num_actions value accordingly depending on the dimension of actions you require.

Ok, I did this by creating a global variable within the pre_physics_step function which was created from the actions variable which I believe is what the AI interfaces with. I rounded it to the nearest 0.1 using torch.round, and called this global variable within the reset functions as you had directed. The AI found the correct position with the rewards that were set and by the end of 1000 iterations all of the poles were spawning in the correct position to stay upright.

As a follow up question, it appears the pre_physics_step generates multiple tensors in between each reset, as it is intended to be used to move the cart back and forth to balance the pole. Is there a place where you set the number of times this step is repeated between resets? It just seems like I am generating a lot of data that isn’t going to be used anywhere which is wasteful.