Question about Orbit RL environment

The following picture is code from (Orbit/ at 8e42f0574792f971a44459a061dd3704cac38bb1 · NVIDIA-Omniverse/Orbit · GitHub).

I have three questions regarding the highlighted part:

  1. How do we know the robot’s joints reach the target position? Should not we do an if else statement to check if it reaches the target position before proceeding to the next step like computing reward?

  2. What is the meaning of the loop here?

  3. What is the meaning of decimation? In the documentation, it says it is the Number of control action updates @ sim dt per policy dt, but I do not really understand it.

Thank you.

Hi @berternats

The IK employed here is a differential inverse kinematics solver which provides delta joint positions for the arm to move to. Given the nature of this method, these delta joint positions are relatively small and it should be reasonable to track them under small number of simulation steps. The same solver has been used priori in the Factory work and it seems to be sufficient.

(2) and (3) kind of go together. The idea is that different controllers run at different frequencies. In this case there are three different controllers:

  • Learning policy (outermost) — X Hz
  • IK solver (middle) — Y Hz
  • Joint level controller (low-level) — Z Hz
  • Physics simulation — Z Hz (usually)

Control decimation is the formal way of saying how many steps of low-level per step of high-level. Typically the physics simulation and joint control are set to the same frequency so we don’t consider that here.

For instance, let’s say the simulation dt is 1 / 100 s (Z=100 Hz). The low-level joint control typically happens at this frequency. However, the IK control happens at a lower frequency (to ensure tracking), i.e. (Y=Z / (decimation)). If decimation is 2 then IK is happening at 50 Hz. In this particular environment, we have learned policy working at IK frequency (i.e. X = Y) so you don’t see another for-loop at an outer level.

I hope this clarifies the doubt.

Thank you so much for the detailed explanation.

So, normally, if the action delta joint positions are big such as randomly picking joint positions, we should make sure they reach the target positions before proceeding to compute reward, am I right?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.