PPO for rl_games vs skrl

berternats · August 10, 2023, 9:58am

Hi, I am using the PPO for Isaac-Lift-Franka-v0, which is a task in NVIDIA Isaac Orbit. I found that the performance of PPO in rl_games is much better than skrl. So, I have tried to adjust the skrl’s parameterS as same as rl_games’s, but it does not work. I am wondering if I overlook something or if the architecture of PPO in rl_games and skrl are fundamentally different. Could you provide me any advice or insight about how to make the PPO in skrl performs as well as rl_games?

Thank you in advance.

Here are the PPO parameters for skrl:

seed: 42

# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/develop/modules/skrl.utils.model_instantiators.html
models:
  separate: False
  policy:  # see skrl.utils.model_instantiators.gaussian_model for parameter details
    clip_actions: True
    clip_log_std: False
    min_log_std: -20.0
    max_log_std: 2.0
    input_shape: "Shape.STATES"
    hiddens: [256, 128, 64]
    hidden_activation: ["elu", "elu", "elu"]
    output_shape: "Shape.ACTIONS"
    output_activation: ""
    output_scale: 1.0
  value:  # see skrl.utils.model_instantiators.deterministic_model for parameter details
    clip_actions: False
    input_shape: "Shape.STATES"
    hiddens: [256, 128, 64]
    hidden_activation: ["elu", "elu", "elu"]
    output_shape: "Shape.ONE"
    output_activation: ""
    output_scale: 1.0


# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ppo.html
agent:
  rollouts: 32
  learning_epochs: 5
  mini_batches: 16
  discount_factor: 0.99
  lambda: 0.95
  learning_rate: 5.e-4
  learning_rate_scheduler: "KLAdaptiveRL"
  learning_rate_scheduler_kwargs:
    kl_threshold: 0.008
  state_preprocessor: "RunningStandardScaler"
  state_preprocessor_kwargs: {"size": env.observation_space, "device": device}
  value_preprocessor: "RunningStandardScaler"
  value_preprocessor_kwargs: {"size": 1, "device": device}
  random_timesteps: 0
  learning_starts: 0
  grad_norm_clip: 1.0
  ratio_clip: 0.2
  value_clip: 0.2
  clip_predicted_values: True
  entropy_loss_scale: 0.0
  value_loss_scale: 4.0
  kl_threshold: 0
  # rewards_shaper_scale: 0.01
  # logging and checkpoint
  experiment:
    directory: "lift"
    experiment_name: ""
    write_interval: 120
    checkpoint_interval: 200


# Sequential trainer
# https://skrl.readthedocs.io/en/latest/modules/skrl.trainers.sequential.html
trainer:
  timesteps: 240000

Here are the PPO parameters for rl_games:

params:
  seed: 42

  # environment wrapper clipping
  env:
    clip_observations: 10.0
    clip_actions: 1.0

  algo:
    name: a2c_continuous

  model:
    name: continuous_a2c_logstd

  network:
    name: actor_critic
    separate: False
    space:
      continuous:
        mu_activation: None
        sigma_activation: None

        mu_init:
          name: default
        sigma_init:
          name: const_initializer
          val: 0
        fixed_sigma: True
    mlp:
      units: [256, 128, 64]
      # units: [512, 256, 128]
      activation: elu
      d2rl: False

      initializer:
        name: default
      regularizer:
        name: None

  load_checkpoint: False # flag which sets whether to load the checkpoint
  load_path: '' # path to the checkpoint to load

  config:
    name: lift
    env_name: rlgpu
    device: 'cuda:0'
    device_name: 'cuda:0'
    multi_gpu: False
    ppo: True
    mixed_precision: False
    normalize_input: True
    normalize_value: True
    value_bootstrap: True
    num_actors: -1
    reward_shaper:
      scale_value: 1.0
    normalize_advantage: True
    gamma: 0.99
    tau: 0.95
    learning_rate: 5e-4
    lr_schedule: adaptive
    schedule_type: legacy
    kl_threshold: 0.008
    score_to_win: 10000
    max_epochs: 10000
    save_best_after: 20
    save_frequency: 20
    print_stats: True
    grad_norm: 1.0
    entropy_coef: 0.0
    truncate_grads: True
    e_clip: 0.2
    horizon_length: 32
    minibatch_size: 2048
    mini_epochs: 5
    critic_coef: 4
    clip_value: True
    seq_lenqq: 4
    bounds_loss_coef: 0.0001

toni.sm · August 10, 2023, 8:14pm

Hi @berternats

Please, check the PPO for rl_games vs skrl · Toni-SM/skrl · Discussion #103 · GitHub to continue the discussion there :)

berternats · August 11, 2023, 4:23am

Hi, thank you so much for the prompt reply.

Regarding what maximum mean reward values I am getting with rl_games and skrl: the task is about a robot trying to grasp an object and lift it up.

The rl_games can complete the task, while in skrl the robot is only able to reach the object and cannot grasp it.

So the reward difference is big because the performance.

toni.sm · August 11, 2023, 5:45pm

Hi @berternats

When the Isaac-Lift-Franka-v0 environment reward function was fixed, only the rl_games and rsl_rl hyperparameters were updated.

Although, for Isaac Orbit, I use, as far as possible, the rl_games hyperparameters, I have updated (in skrl-v1.0.0-rc.2, released recently), the hyperparameters for the Isaac-Lift-Franka-v0 environment but this time based on rsl_rl. Furthermore, I have added, to the last version, the time-limit (episode truncation) bootstrapping to skrl’s on-policy agents, which allows for better mean reward values.

The next plot shows the mean reward for the Isaac-Lift-Franka-v0 environment for the mentioned libraries (and with the skrl hyperparameters updated, not published yet in Isaac Orbit code). Note that, since the number of parallel environments for the lift tasks was increased from 1024 to 4096, rl_games, with the available hyperparameters, takes much longer to train.

I am working on the skrl integration in the Isaac Orbit repository (to be pushed soon) which will include JAX support and an update of the training hyperparameters.

Meanwhile, you can play with the standalone training script for Isaac Orbit from the skrl docs: torch_lift_franka_ppo.py

berternats · August 13, 2023, 9:34am

Thank you so much for the benchmarking. I really appreciate that.

I will try with the parameters you provided.

By the way, may I know which parameters you were playing to get the results shown in Figure 2?

Were you using the adult controller or the IK controller?

toni.sm · August 13, 2023, 8:27pm

Hi @berternats

Both implementations for skrl (Figure 1 and 2) use the same hyperparameters:
Note that the initial_log_std and time_limit_bootstrap are not available in the current public version of Isaac Orbit.

seed: 42

# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
models:
  separate: False
  policy:  # see skrl.utils.model_instantiators.gaussian_model for parameter details
    clip_actions: False
    clip_log_std: True
    min_log_std: -20.0
    max_log_std: 2.0
    initial_log_std: 1.0
    input_shape: "Shape.STATES"
    hiddens: [256, 128, 64]
    hidden_activation: ["elu", "elu", "elu"]
    output_shape: "Shape.ACTIONS"
    output_activation: ""
    output_scale: 1.0
  value:  # see skrl.utils.model_instantiators.deterministic_model for parameter details
    clip_actions: False
    input_shape: "Shape.STATES"
    hiddens: [256, 128, 64]
    hidden_activation: ["elu", "elu", "elu"]
    output_shape: "Shape.ONE"
    output_activation: ""
    output_scale: 1.0


# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
  rollouts: 96
  learning_epochs: 5
  mini_batches: 4
  discount_factor: 0.99
  lambda: 0.95
  learning_rate: 1.e-3
  learning_rate_scheduler: "KLAdaptiveRL"
  learning_rate_scheduler_kwargs:
    kl_threshold: 0.01
    min_lr: 1.e-5
  state_preprocessor: "RunningStandardScaler"
  state_preprocessor_kwargs: null
  value_preprocessor: "RunningStandardScaler"
  value_preprocessor_kwargs: null
  random_timesteps: 0
  learning_starts: 0
  grad_norm_clip: 1.0
  ratio_clip: 0.2
  value_clip: 0.2
  clip_predicted_values: True
  entropy_loss_scale: 0.01
  value_loss_scale: 1.0
  kl_threshold: 0
  rewards_shaper_scale: 1.0
  time_limit_bootstrap: True
  # logging and checkpoint
  experiment:
    directory: "lift"
    experiment_name: ""
    write_interval: 800
    checkpoint_interval: 8000


# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
  timesteps: 67200

Regarding to the task parameters, Figure 2 uses the default task parameters as defined in Isaac Orbit lift_cfg.py file

berternats · August 14, 2023, 3:55am

Thank you so much!

And, I am sorry for an unrelated topic, when I am using skrl for training, such as *Isaac-Lift-Franka-v0, the training will stop halfway, even though it is still far away from exceeding the available GPU memory.

The error is like there is an error running Python. something like that.

May I know if you have any clue?

Thank you in advance.

toni.sm · August 14, 2023, 9:16pm

Hi @berternats

It is difficult to know without a specific error message.

Can you provide the error message or logs?
Have you made any modifications to the task?
Are you using the latest skrl version?

berternats · August 15, 2023, 10:29am

Can you provide the error message or logs?
I have attached the error message. The training stopped suddenly.
Have you made any modifications to the task?
I am using my own environment. But it seems only skrl has this problem. rl_games is working well with the environment.
Are you using the latest skrl version?
I am using the 0.10.0 version.

Thank you in advance!

toni.sm · August 15, 2023, 11:00am

Hi @berternats

Mmmm, I have never had these types of problems with Isaac Orbit, but perhaps it could be something similar to what is described in the following discussion (which is fixed in latest skrl versions).

Can you try the latest version (skrl-v1.0.0-rc.2)?
Are you running the example scripts included in the skrl (e.g.: torch_lift_franka_ppo.py), or the examples integrated in Isaac Orbit?

berternats · August 18, 2023, 9:23am

Thank you so much. I will give the latest skrl a try.

I and using the examples integrated into Isaac Orbit.

berternats · August 22, 2023, 8:11am

Hi, may I know is the latest version, skrl-v1.0.0, is ready-to-use for Issac Orbit?
Previously Isaac Orbit supports the skrl-v0.10.2 version, so now is skrl-v1.0.0 ready-to-use for Issac Orbit? Do we need to modify anything before using it?

Thank you so much.

toni.sm · August 22, 2023, 8:53pm

Hi @berternats

I submitted the skrl JAX by Toni-SM · Pull Request #109 · NVIDIA-Omniverse/Orbit · GitHub to Isaac Orbit repository that uses the latest version (skrl>=1.0.0).

Waiting for approval :)

Meanwhile, you can play with Isaac Orbit environments via the skrl’s standalone scripts for Isaac Orbit.

berternats · August 23, 2023, 9:36am

I tried the latest version in Orbit, where I just changed the structure environment loaders and wrappers file hierarchy (link attached).

It runs successfully. However, with exactly the same parameters (parameters you provided) and environments (same tasks), 1.0.0 and 0.10.2 their performance are quite different, in which in 0.10.2, my robot can successfully grasp the object, whereas 1.0.0 absolutely does not.

May I know if you have any clue? Is there anything I missed?

Thank you in advance.

github.com

Toni-SM/skrl/blob/main/CHANGELOG.md#:~:text=Structure environment loaders and wrappers file hierarchy coherently Import statements now follow the next convention

# Changelog

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [1.0.0] - 2023-08-16

Transition from pre-release versions (`1.0.0-rc.1` and`1.0.0-rc.2`) to a stable version.

This release also announces the publication of the **skrl** paper in the Journal of Machine Learning Research (JMLR): https://www.jmlr.org/papers/v24/23-0112.html

Summary of the most relevant features:
- JAX support
- New documentation theme and structure
- Multi-agent Reinforcement Learning (MARL)

## [1.0.0-rc.2] - 2023-08-11
### Added
- Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
- Time-limit (truncation) boostrapping in on-policy actor-critic agents
- Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value

This file has been truncated. show original

berternats · September 6, 2023, 12:47pm

Hello, May I know any update on this?

Topic		Replies	Views
SKRL: a modular reinforcement learning library with Isaac Gym environments support Isaac Gym	16	4272	July 25, 2023
SKRL: a modular reinforcement learning library with support for NVIDIA Omniverse Isaac Gym Isaac Sim	14	7604	July 19, 2024
Spawning Rigid Body in Isaac Lab is incorrect Isaac Sim isaac-sim-v4-2-0	6	389	October 14, 2024
SKRL Documentation Jetbot example for Isaac sim doesn't work Isaac Sim rl	13	857	September 4, 2023
Trying to port Jetbot RL RuntimeError: normal expects all elements of std >= 0.0 Isaac Gym	13	829	March 21, 2024
Closing the Sim-to-Real Gap: Training Spot Quadruped Locomotion with NVIDIA Isaac Lab Technical Blog	2	226	December 20, 2024
OmniIsaacGymEnv porevious version (Isaac Sim 2022.2.0) Isaac Sim	6	562	June 10, 2023
Use SAC under Isaac Gym (v. 2023.0.1b) Isaac Sim	2	599	April 28, 2024
Unable to train multi environment robot Isaac Sim isaacsim , gym	8	2628	December 28, 2022
Isaac 2023.1.0 examples slow and buggy Isaac Sim	24	2729	November 14, 2023

PPO for rl_games vs skrl

Related topics