Need help with debugging my custom RL task

Hi everyone!

I am trying to use Isaac Sim/Gym to train my robotic leg on how to balance itself, unfortunately this was quite is a challenge for me. I will also mention im pretty much a beginner on RL and still learning about simulating robotics… I think im very close to getting what i want but im just missing something, but when i am training my robot the physics just start going crazy and things crash, sometimes it starts glitching out after a while of what seems to be normal training and vice versa.

I’m pretty bad at explaining things so I think a video would help get the idea across much better

When the robot falls (contact sensor reaches 0) i want the robot to reset to its inital positions, both world and joint wise… Maybe this is the wrong approach? im not sure… Ill be more then happy to share my code upon request. But it is based on the documentation for the stable baselines 3 RL cartpole example that existed for isaac sim/gym before isaac sim 4.0.0 got rid of all that.

i was trying to migrate to isaac lab/isaac sim 4.0.0 but the documentation is very confusing to me… the previous documentation for 2023.1.1 was also confusing but i found i was able to start with the example i spoke about above. ive decided it would be best to try and master/finish what i trying to get done before, then try and migrate everything.

I also tried to use the old isaac gym preview but i couldnt get past the joint monkeys/other examples… maybe its because im using ubuntu 22.04? I dont know

If this helps… My end goal is to make my own biped that learns to walk using RL and Isaac Sim, and getting this single leg to balance will start me off with RL and show that the leg i have designed is capable/strong enough to carry its own weight at the least.

Thank you for reading, and any help will be much appreciated!!!

Too be clear, it does sometimes work well until the inevitable happens

this video is a little longer as getting it to be normal for a while is pretty random

For any other beginner who comes across this with the same issue, I solved this by setting the velocity of the entire robot to 0 every reset.

Beginner Mistake… Lol

This is a good reset function to start your robot on

    def reset_idx(self, env_ids=None) -> None:
        """Reset the environment for the given environment IDs."""
        if env_ids is None:
            env_ids = torch.arange(self.num_envs, device=self._device)
        num_resets = len(env_ids)

        # Use the predefined initial positions and velocities
        initial_positions = self.default_positions.expand(num_resets, -1)
        initial_velocities = self.default_velocities.expand(num_resets, -1)

        indices = env_ids.to(dtype=torch.int32)

        # Reset the robot's base position and orientation in the world
        self._legs.set_world_poses(
            self.initial_root_pos[env_ids].clone(), self.initial_root_rot[env_ids].clone(), indices
        )


        self._legs.set_joint_positions(initial_positions, indices=indices)
        self._legs.set_joint_velocities(initial_velocities, indices=indices)
        #self._legs.set_joint_efforts(initial_efforts, indices=indices)
        
        zero_velocities = torch.zeros((num_resets, 6), device=self._device)  # 6: 3 for linear and 3 for angular velocities
        self._legs.set_velocities(zero_velocities, indices=indices)

        # Increment and print the reset counter
        self.reset_counter += 1
        #print(f"Environment reset! Total resets: {self.reset_counter}")

        # Reset environment timers
        self.env_timers[env_ids] = 0.0

        # Randomize maximum episode times for each environment
        self.max_episode_times[env_ids] = torch.rand(len(env_ids), device=self._device) * (self.max_episode_time - 15) + 15

        #print(self.max_episode_times)
        
        self.reset_buf[env_ids] = 0
        self.progress_buf[env_ids] = 0