How to use custom RL network in Isaacgym

Can I use a custom network structure and algorithm in Isaacgym for the reinforcement learning part? If I want to change the reinforcement learning algorithm, in which files do I need to make changes, and where do I need to access my own algorithm?

Hi @user8193

I encourage you to try skrl (SKRL - Reinforcement Learning library — skrl 0.2.0 documentation), a modular RL library designed to put in the hands of researchers the control of the RL system. It is focused on readability, transparency and simplicity of the code, where each class inherits from one and only one base class that provides a common interface (no need to navigate through half a dozen files to identify how the algorithms work).

This library allows you to create and use custom networks as shown below and to implement your own algorithms (see basic inheritance usage)

Please, visit the documentation for usage details and examples

class Policy(GaussianModel):
    def __init__(self, observation_space, action_space, device, clip_actions=False,
                 clip_log_std=True, min_log_std=-20, max_log_std=2):
        super().__init__(observation_space, action_space, device, clip_actions,
                         clip_log_std, min_log_std, max_log_std)

        self.linear_layer_1 = nn.Linear(self.num_observations, 32)
        self.linear_layer_2 = nn.Linear(32, 32)
        self.mean_action_layer = nn.Linear(32, self.num_actions)
        self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))

    def compute(self, states, taken_actions):
        x = F.elu(self.linear_layer_1(states))
        x = F.elu(self.linear_layer_2(x))
        return torch.tanh(self.mean_action_layer(x)), self.log_std_parameter

class Value(DeterministicModel):
    def __init__(self, observation_space, action_space, device, clip_actions=False):
        super().__init__(observation_space, action_space, device, clip_actions) = nn.Sequential(nn.Linear(self.num_observations, 32),
                                 nn.Linear(32, 32),
                                 nn.Linear(32, 1))

    def compute(self, states, taken_actions):


networks = {"policy": Policy(env.observation_space, env.action_space, device, clip_actions=True),
            "value": Value(env.observation_space, env.action_space, device)}