Compute_observations() in the custom task

hosei2 · May 15, 2021, 10:22pm

Hi all,

is there any rule of thumb for the obs_buf? Is it always better to put all the information there?
For example, my task is about reaching a sphere with the end-effector tip. Here, I defined the obs_buf as below:

   to_target = self.sphere_poses - self.my_robot_tip_pos

    self.obs_buf[..., 0:5] = dof_pos_scaled
    self.obs_buf[..., 5:10] = self.my_robot_dof_vel * self.dof_vel_scale
    self.obs_buf[..., 10:13] = to_target
    self.obs_buf[..., 13:16] = self.sphere_poses

But is there any other rule of thumb for designing the obs_buf?

vmakoviychuk · May 18, 2021, 5:33pm

Hi @hosei2,

Your choice of observations looks good to me. Adding robot end-effector orientation could be helpful as well. I also found that adding past actions to the observation helped learning in a lot of cases.

A choice of the observations often depends on your goals - if you have a real robot and you’d like to perform sim2real your choice of the observations is limited to what is available on a real robot and from it’s surrounding. But even in this case you can use asymmetric PPO version for training and a full set of observations to pass to the value function, see Shadow Hand environment as an example.

If you don’t have a sim2real goal sharing all information provided by a simulator, and sometimes even with hand-crafted features would be a good first step. It depends on the complexity of the task as well - simpler tasks can be usually solved with a very limited set of observations. And having less observations allows using smaller networks and fater training.

Also, the reward is very important, I’d say often it’s more important than observations choice.

lvahre16 · May 19, 2021, 7:29am

I also found that adding past actions to the observation helped learning in a lot of cases.

Could you explain the implementation of this? I have seen this done in research papers, but I found that it did not work at all in my own implementation.

In the compute_observation function I would input self.actions which comes from pre_physics_steps with self.actions = actions.clone().to(self.device) (humanoid.py does it like this), but the results were never good. Is that the right approach to go about it, or should it be the scaled actions that you input to the actuators of your robot?

vmakoviychuk · May 25, 2021, 3:09pm

There could be different approaches, but the most simple is just copying previous actions similar to what you’ve described. But your agent doesn’t train well most likely the reason is not in how past actions are copied but with the reward and other observations themselves. The same humanoid env can train quite well even without adding past actions to the observations.

lvahre16 · May 25, 2021, 5:14pm

Alright, thank you, I will give it another shot some day.

The strange thing was that my initial test with the actions copied to the observations was really bad, but with the same reward and observations without actions it was able to train quite well for the task I have created.

vmakoviychuk · June 30, 2021, 7:38am

Can you confirm that you copied the actions, produced by the policy, usually in range -1 1, not the actuation forces applied to the joints? It is the first reason I can think of, why training became worse. With past actions in the range, -1 1 passed as observations in the worst case, the performance should be the same.

lvahre16 · July 2, 2021, 3:51pm

I did not find the time to test it yet but will check the values and make another attempt when I am back in the office next week. But I do remember using the self.actions variable which should be in the range [-1, 1] and before it is scaled to a force, etc.

Topic		Replies	Views
Loading Two Learned Policies Applies Only One Observation Isaac Sim isaac-sim-v4-2-0	3	178	December 20, 2024
How can I implement a sequence of tasks on a single robot in IsaacSim using Python? Isaac Sim	4	268	July 11, 2024
Customization of IsaacSim Deep RL example Isaac Sim	4	1537	March 30, 2023
Isaac orbit OperationSpaceController help, no examples implemented to run Isaac Sim	0	296	May 3, 2024
Sim_to_real problem Isaac Gym	9	2015	January 5, 2024
RL training doesn't work when observation buffer is created on GPU Isaac Sim	0	458	December 27, 2023
How to speed up the robot? Isaac Sim python	15	1501	July 13, 2022
Fast-Track Robot Learning in Simulation Using NVIDIA Isaac Lab Technical Blog	1	52	July 29, 2024
How to add observation term from height map sensor in the deploying policy script Isaac Sim sensor , isaac-lab , isaac-sim-v4-5-0	2	20	May 28, 2025
Robot Behaviors Different Between Orbit and Issac Sim Isaac Sim	5	480	May 12, 2023

Compute_observations() in the custom task

Related topics