Question about Orbit RL environment

I am trying to understand how the is working.

I have two questions:

  1. How action, which is from “def _step_impl(self, actions: torch.Tensor)” function, is generated?
    Does it random;y generated from “self.action_space = gym.spaces.Box(low=-1.0, high=1.0, shape=(self.num_actions,))” ?

  2. In this script, they are two control types, which are joint control and inverse kinematics.
    In the “def _step_impl(self, actions: torch.Tensor)” function, the argument, action, is the target position of the joint for joint control, which is normal, but somehow the action becomes the end effector desired position and orientation for inverse kinematics? Which does not make sense because the dimension for those two data is completely different. What is the mechanism behind it?

Thank you.


  1. The actions are provided by the user as inputs to the environment. The suggested box range from (-1, 1) is the typical expected range of actions but the tensor shape should always be (num_envs, num_actions). You can look at the file in standalone/environments to see how the mechanism works.

  2. Switching between different control mechanisms (IK or joint control) is being handled by the configuration instance. If it says the control mode is “inverse_kinematics” then the input actions are expected like that and they get resolved inside the class to compute joint level commands from the IK control. If the control actions are default (i.e. joint), then they are passed to the robot as it is.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.